DeepSeek releases DeepSpec, an open-source codebase for training speculative decoding draft models for DeepSeek-V4 · Digg

DeepSeek releases DeepSpec, an open-source codebase for training speculative decoding draft models for DeepSeek-V4 · Digg

Posts from X

Most Activity

VIEWS3.7KBOOKMARKS28LIKES63

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

DeepSeek releases their decoding module DSpark for V4 checkpoints, which improves a lot upon MTP-1, Eagle-3 and DFlash. Out of their vast goodwill, they also open source DeepSpec: "a codebase for training and evaluating draft models for speculative decoding".

Zhipeng Huang@nopainkiller

official dsv4 spec dec and draft model @teortaxesTex

github: https://github.com/deepseek-ai/DeepSpec

huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark/tree/main

1h3.7K6328

RETWEETS10REPLIES6

Daniel Han@danielhanchen

DeepSeek just released DSpark for V4 Flash & Pro, a new speculative decoding method boosting throughput by 51% to 400%!

DS also showed DSpark works well for other models like Gemma & Qwen

Github: https://github.com/deepseek-ai/DeepSpec Paper: https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf HF: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark

33m9833414

Lisan al Gaib@scaling01

DeepSeek just open-sources another piece of their training stack.

DeepSpec: a full-stack codebase for training and evaluating speculative decoding models

https://github.com/deepseek-ai/DeepSpec

1h2K2315

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@scaling01 don't forget DSpark

Lisan al Gaib@scaling01

DeepSeek just open-sources another piece of their training stack.

DeepSpec: a full-stack codebase for training and evaluating speculative decoding models

https://github.com/deepseek-ai/DeepSpec

39m31211

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

It's been very lame how the industry has been failing to adopt good speculative decoding as the baseline. Just like Whale forced everyone onto MTP, now they may succeed with semi-AR drafting. @zephyr_z9 @antirez @_xjdr @norpadon does this look less BS than the previous one?

55m8514

lily zhang@lily_gpupoor

@danielhanchen this is so cool! does dspark have a comparison with DFlash, which seems to be very widely adopted in the industry?

23m812

Zhipeng Huang@nopainkiller

official dsv4 spec dec and draft model @teortaxesTex

github: https://github.com/deepseek-ai/DeepSpec

huggingface: https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro-DSpark/tree/main

1h7.2K3114

Daniel Han@danielhanchen

@lily_gpupoor Oh yes I think they show it in the Qwen / Gemma table - (still reading the paper haha) page 11 I think

19m941

Hasan Can@HCSolakoglu

@danielhanchen Is it better or worse than JetSpec?

8m25

lily zhang@lily_gpupoor

@danielhanchen AI infra industry gets accelerated by DeepSeek again. This is crazy.

14m212

lily zhang@lily_gpupoor

@danielhanchen Just saw it. They also support DSpark and DFlash.

22m641

Noé Flandre@NoeFlandre

@danielhanchen MTP at training time, DSpark at inference time. It’s starting to go crazy fast folks

21m561

Strata@ChainZenit

@danielhanchen that throughput jump is actually insane

32m531

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@zephyr_z9 @antirez @_xjdr @norpadon V4 on differences between DFlash and DSpark

5m146

方星：返利窗口OKX丨45返 🚀@SUURLAHETTILAAT

@teortaxesTex 这效率太顶了直接给开发者省心了

58m117

高煜朗@tRpyNXsk3bDB6A7

@teortaxesTex where is the paper

42m97

安叫兽|Bird🕊️ 🔶 BNB@ajs6888

@teortaxesTex 这波开源节奏有点猛啊

16m22

Ismael@notismaelvega

@danielhanchen will there be a quant version of this model?

4m8

keys 🧪@u1tra_instinct

@HCSolakoglu @danielhanchen The same? Jet spec va deep spec similar hypothesis working on the same hypothesis

4m3