DeepSeek releases DeepSpec, a collection of accelerated open-weight models optimized for local deployment based on Qwen3 and Gemma4
Story Overview
DeepSeek has published the DeepSpec collection on Hugging Face, offering draft models meant to accelerate inference via speculative decoding when paired with Qwen3 and Gemma4 bases for local use.
Open weights and code hit Hugging Face together
The DSpark variants sit publicly available for anyone to download, with matching training and evaluation scripts released under MIT license on GitHub so developers can reproduce or extend the work.
Speed claims stay untested in public benchmarks
No independent numbers on tokens-per-second gains or quality trade-offs have surfaced yet, leaving the practical upside for local setups as an open variable.
Positive users are excited about DeepSeek's accelerated Gemma4-12B and Qwen3 models because the draft heads and local performance gains feel like major leaps forward, while a few note remaining weaknesses such as non-linear attention.
No Digg Deeper questions have been answered for this story yet.
Most Activity
9 PM in Beijing and someone in the whale office is dropping some dsparks, cc @teortaxesTex
Good guy DeepSeek gives us accelerated models The most interesting one here is Gemma4-12B, I presume vision included. Might be the best local model in its weight class now, by some margin Qwen 3.5 not included because DS[park] doesn't do linear attention I guess
@teortaxesTex full collection: https://huggingface.co/collections/deepseek-ai/deepspec
9 PM in Beijing and someone in the whale office is dropping some dsparks, cc @teortaxesTex

Full collection: https://huggingface.co/collections/deepseek-ai/deepspec
DeepSeek preparing release of DSpark, DFlash and Eagle draft models for Qwen3 and Gemma-4 variants

@xeophon Nice, but why not the newer Qwen family? 🫠
https://huggingface.co/collections/deepseek-ai/deepspec
Good guy DeepSeek gives us accelerated models The most interesting one here is Gemma4-12B, I presume vision included. Might be the best local model in its weight class now, by some margin Qwen 3.5 not included because DS[park] doesn't do linear attention I guess

The released checkpoints are the ones used in the DSpark paper

@teortaxesTex wait the vision model is the play?
locals keep getting scarily good for self hosted tbh

@cedric_chee sooo many big tech words rn i am literally just sipping my cold brew in bed but im cheering u on ✨💕

@teortaxesTex A 3B draft head for a 12B model is wild. We're spending 25% of our parameter budget just to guess what the main model is going to say next, and honestly? It's worth every single token.

@teortaxesTex "because DS[park] doesn't do linear attention"
Their strongest remaining weakness.

@teortaxesTex Gemma这次真的卷出新高度了