/Tech7h ago

DeepSeek-AI and Peking University open-source DSpark, using speculative decoding to boost LLM inference throughput by up to 400%

Story Overview

DeepSeek-AI paired with Peking University researchers to release DSpark, a speculative decoding framework that layers a lightweight draft module onto existing model checkpoints. The open-source package ships with DeepSpec code, training tools, and a research paper under an MIT license, already running in production preview engines for V4 Flash and Pro variants while extending speedups to Gemma and Qwen checkpoints as well.

10444216027.8K

#1215

Original post

ℏεsam@Hesamation

when you put a company like DeepSeek under GPU restrictions, they invent a way to boost their throughput by 51% to even 400%.

now it makes sense how DeepSeek-V4-Pro is: > ~28x cheaper than Opus 4.8 > ~34x cheaper than GPT 5.5

WHAT DOESN'T KILL YOU MAKES YOU STRONGER.

Daniel Han@danielhanchen

DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!

DS also showed DSpark works well for other models like Gemma & Qwen

Github: https://github.com/deepseek-ai/DeepSpec Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark

1:46 PM · Jun 27, 2026 · 15.9K Views

Performance Watch

Live traffic benchmarks reveal load-aware gains

Under real user load the system reports per-user generation lifts of 57-85 percent versus prior baselines, with throughput scaling from 51 percent at lower token targets up to several times higher when latency SLAs tighten. A calibrated confidence scheduler dynamically trims verification length based on GPU occupancy so gains hold without extra waste.

Developer Impact

Draft checkpoints ship for non-DeepSeek models

Alongside the core release, lightweight draft heads for Qwen3 variants and Gemma4-12B are available on Hugging Face, letting teams test the same semi-autoregressive drafting approach on their own checkpoints without retraining the target model.

Sentiment

Many users praised DeepSeek's DSpark release for delivering major throughput gains and costs below electricity through innovative engineering under resource constraints.

Pos

100.0%

Neg

0.0%

7 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.