DeepSeek just open-sources another piece of their training stack.
DeepSpec: a full-stack codebase for training and evaluating speculative decoding models
The MIT-licensed release includes the DeepSeek-V4-Pro-DSpark draft model.
DeepSeek just open-sources another piece of their training stack.
DeepSpec: a full-stack codebase for training and evaluating speculative decoding models
Users praise DeepSeek's open-sourcing of DeepSpec and DSpark for speculative decoding because the releases deliver massive LLM throughput gains and accelerate AI infrastructure.
No Digg Deeper questions have been answered for this story yet.
DeepSeek releases their decoding module DSpark for V4 checkpoints, which improves a lot upon MTP-1, Eagle-3 and DFlash. Out of their vast goodwill, they also open source DeepSpec: "a codebase for training and evaluating draft models for speculative decoding".
official dsv4 spec dec and draft model @teortaxesTex
github: https://github.com/deepseek-ai/DeepSpec
huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark/tree/main
DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!
DS also showed DSpark works well for other models like Gemma & Qwen
Github: https://github.com/deepseek-ai/DeepSpec Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
DeepSeek just open-sources another piece of their training stack.
DeepSpec: a full-stack codebase for training and evaluating speculative decoding models
https://github.com/deepseek-ai/DeepSpec
@scaling01 don't forget DSpark
DeepSeek just open-sources another piece of their training stack.
DeepSpec: a full-stack codebase for training and evaluating speculative decoding models
https://github.com/deepseek-ai/DeepSpec

It's been very lame how the industry has been failing to adopt good speculative decoding as the baseline. Just like Whale forced everyone onto MTP, now they may succeed with semi-AR drafting. @zephyr_z9 @antirez @_xjdr @norpadon does this look less BS than the previous one?

@danielhanchen this is so cool! does dspark have a comparison with DFlash, which seems to be very widely adopted in the industry?
official dsv4 spec dec and draft model @teortaxesTex
github: https://github.com/deepseek-ai/DeepSpec
huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark/tree/main

@lily_gpupoor Oh yes I think they show it in the Qwen / Gemma table - (still reading the paper haha) page 11 I think

@danielhanchen Is it better or worse than JetSpec?

@danielhanchen AI infra industry gets accelerated by DeepSeek again. This is crazy.

@danielhanchen Just saw it. They also support DSpark and DFlash.

@danielhanchen MTP at training time, DSpark at inference time. It’s starting to go crazy fast folks

@danielhanchen that throughput jump is actually insane

@zephyr_z9 @antirez @_xjdr @norpadon V4 on differences between DFlash and DSpark

@teortaxesTex 这效率太顶了 直接给开发者省心了

@teortaxesTex where is the paper

@teortaxesTex 这波开源节奏有点猛啊

@danielhanchen will there be a quant version of this model?

@HCSolakoglu @danielhanchen The same? Jet spec va deep spec similar hypothesis working on the same hypothesis