Prime Intellect's kalomaze argues DeepSeek-R1 validates outcome rewards and policy gradients over Monte Carlo Tree Search

VIEWS3.5K

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

… yeah, they won't make it

Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?

2h3.5K424

BOOKMARKS19LIKES59RETWEETS4REPLIES7

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Elvis Nava@elvisnavah

Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?

44m1.9K5919

elie@eliebakouch

only included the model releases of the deepseek series, so there is also janus, and research papers like NSA, Engram, mHC, and others that i'm likely forgetting

not saying this is why there was a "deepseek moment" (people who discovered deepseek via mainstream media etc.. are not really the target for research papers), but it's just a bit frustrating to reduce deepseek research to "distilling other models"

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

40m339130

kalomaze@kalomaze

the distillation narrative that happened afterwards was of course a psyop in its own way you dont produce an r1 at the time DeepSeek did by imitating a precollected corpus naively ALSO; this was emphatically so not at all the point of say R1-Zero as a research artifact

kalomaze@kalomaze

r1 was THE platonic validation of outcome rewards + pg being everything you principally needed for a generative model to bootstrap towards capabilities for which there's no existing data distribution there was a huge psyop at the time focused around MCTS (and search more broadly)

4h627180

kalomaze@kalomaze

@hdarshane the way i like to think about it is that the differentiable bottom up exploration is going to (eventually) be richer than the "optimal" local one at any given point, bc its native to the baseline natural progress or solution rate, so more representationally aware by default

3h6811

elie@eliebakouch

(and obviously all the infra like DeepEP etc. that are probably as impactful if not more than the research papers)

elie@eliebakouch

only included the model releases of the deepseek series, so there is also janus, and research papers like NSA, Engram, mHC, and others that i'm likely forgetting

not saying this is why there was a "deepseek moment" (people who discovered deepseek via mainstream media etc.. are not really the target for research papers), but it's just a bit frustrating to reduce deepseek research to "distilling other models"

35m22860

kalomaze@kalomaze

@elvisnavah yeah but you uh don't predominantly pretrain a v3 scale model on raw oai outputs alpaca style to begin with

3h372

Elvis Nava@elvisnavah

@kalomaze I am a big believer in the fact that without a good pretrain you don't go anywhere. And that is the thing it's easier to catch up with but harder to make good?

3h101

Elvis Nava@elvisnavah

@kalomaze Yeah sure maybe my take reads reductive

3h271

Elvis Nava@elvisnavah

@eliebakouch yeah lesson learned, I was really talking about the former but I am getting dunked rightly about the latter

39m211

Elvis Nava@elvisnavah

@teortaxesTex Hey man don't be hating, we're in Switzerland

2h21

EternalTwilight@eternal_twil

@kalomaze Tbf that’s still the good ol optimized search trajectory

4h21

wh@nrehiew_

@eliebakouch ignore the ragebaiters elie

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

30m12720

davinci@leothecurious

@kalomaze Q*

4h412

EternalTwilight@eternal_twil

@kalomaze Yeah, that's fair; also add on top that r1 explicitly removed the critic

3h232

josepha_mayo@josepha_mayo

@eliebakouch yep GRPO ,their own rl pipeline, i think rlvr also came from them

13m111

Hiranmay Darshane@hdarshane

@kalomaze http://hiranmay.com/blog/mcts-equivalence In some sense it is doing MCTS/search, though... Just that no one expected outcome rewards would get us there.

4h111

αιamblichus@aiamblichus

@eliebakouch The v4 DeepSeek models are truly excellent and unique, and the fact that people are not using them more despite their incredible price point is largely a result of a coordinated psyop by Big Token

22m30

Elvis Nava@elvisnavah

@eliebakouch They are cooking me in the qrts

44m26

kalomaze@kalomaze

@elvisnavah alibaba is and was far more dubious wrt messy synth in pretraining, deepseek's pretrain data pipelining in general seems fairly ~pristine imo

3h24

Prime Intellect's kalomaze argues DeepSeek-R1 validates outcome rewards and policy gradients over Monte Carlo Tree Search

Story Overview

Distillation claims still circulate

RL priorities may simplify