Prime Intellect's kalomaze argues DeepSeek-R1 validates outcome rewards and policy gradients over Monte Carlo Tree Search

VIEWS5.3KBOOKMARKS52LIKES139RETWEETS9REPLIES8

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

Elvis Nava@elvisnavah

Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?

1h5.3K13952

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

… yeah, they won't make it

Elvis Nava@elvisnavah

Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?

3h4K484

elie@eliebakouch

only included the model releases of the deepseek series, so there is also janus, and research papers like NSA, Engram, mHC, and others that i'm likely forgetting

not saying this is why there was a "deepseek moment" (people who discovered deepseek via mainstream media etc.. are not really the target for research papers), but it's just a bit frustrating to reduce deepseek research to "distilling other models"

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

1h628250

kalomaze@kalomaze

the distillation narrative that happened afterwards was of course a psyop in its own way you dont produce an r1 at the time DeepSeek did by imitating a precollected corpus naively ALSO; this was emphatically so not at all the point of say R1-Zero as a research artifact

kalomaze@kalomaze

r1 was THE platonic validation of outcome rewards + pg being everything you principally needed for a generative model to bootstrap towards capabilities for which there's no existing data distribution there was a huge psyop at the time focused around MCTS (and search more broadly)

5h662201

elie@eliebakouch

(and obviously all the infra like DeepEP etc. that are probably as impactful if not more than the research papers)

elie@eliebakouch

only included the model releases of the deepseek series, so there is also janus, and research papers like NSA, Engram, mHC, and others that i'm likely forgetting

not saying this is why there was a "deepseek moment" (people who discovered deepseek via mainstream media etc.. are not really the target for research papers), but it's just a bit frustrating to reduce deepseek research to "distilling other models"

1h468120

kalomaze@kalomaze

@hdarshane the way i like to think about it is that the differentiable bottom up exploration is going to (eventually) be richer than the "optimal" local one at any given point, bc its native to the baseline natural progress or solution rate, so more representationally aware by default

4h6811

wh@nrehiew_

@eliebakouch ignore the ragebaiters elie

elie@eliebakouch

yeah sure must be this, and also maybe 3 years of open research and development

DeepSeek LLM (2023) https://arxiv.org/abs/2401.02954 DeepSeek-Coder (2023) https://arxiv.org/abs/2401.14196 DeepSeekMoE (2024) https://arxiv.org/abs/2401.06066 DeepSeekMath (2024) https://arxiv.org/abs/2402.03300 DeepSeek-VL (2024) https://arxiv.org/abs/2403.05525 DeepSeek-V2 (2024) https://arxiv.org/abs/2405.04434 DeepSeek-Prover (2024) https://arxiv.org/abs/2405.14333 DeepSeek-Coder-V2 (2024) https://arxiv.org/abs/2406.11931 DeepSeek-Prover-V1.5 (2024) https://arxiv.org/abs/2408.08152 DeepSeek-VL2 (2024) https://arxiv.org/abs/2412.10302 DeepSeek-V3 (2024) https://arxiv.org/abs/2412.19437 DeepSeek-R1 (2024) https://arxiv.org/abs/2501.12948 DeepSeek-Prover-V2 (2025) https://arxiv.org/abs/2504.21801 DeepSeek-V3.2-Exp (2025) https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf DeepSeek-OCR (2025) https://arxiv.org/abs/2510.18234 DeepSeekMath-V2 (2025) https://arxiv.org/abs/2511.22570 DeepSeek-V3.2 (2025) https://arxiv.org/abs/2512.02556 DeepSeek-V4 (2026) https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/DeepSeek_V4.pdf

1h28970

kalomaze@kalomaze

@elvisnavah yeah but you uh don't predominantly pretrain a v3 scale model on raw oai outputs alpaca style to begin with

4h372

Elvis Nava@elvisnavah

@kalomaze I am a big believer in the fact that without a good pretrain you don't go anywhere. And that is the thing it's easier to catch up with but harder to make good?

4h101

Elvis Nava@elvisnavah

@kalomaze Yeah sure maybe my take reads reductive

4h271

Elvis Nava@elvisnavah

@eliebakouch yeah lesson learned, I was really talking about the former but I am getting dunked rightly about the latter

1h211

Lorenz@Lorenzifix

@aiamblichus @eliebakouch Yes, they are amazing. Sadly, because of the new AI laws in China, possibly the last of the free-thinking, very intelligent, honest models from China. We, as a species, seem to often prefer lead over gold.

37m161

Elvis Nava@elvisnavah

@teortaxesTex Hey man don't be hating, we're in Switzerland

3h21

EternalTwilight@eternal_twil

@kalomaze Tbf that’s still the good ol optimized search trajectory

5h21

davinci@leothecurious

@kalomaze Q*

5h412

EternalTwilight@eternal_twil

@kalomaze Yeah, that's fair; also add on top that r1 explicitly removed the critic

4h232

josepha_mayo@josepha_mayo

@eliebakouch yep GRPO ,their own rl pipeline, i think rlvr also came from them

52m111

Hiranmay Darshane@hdarshane

@kalomaze http://hiranmay.com/blog/mcts-equivalence In some sense it is doing MCTS/search, though... Just that no one expected outcome rewards would get us there.

5h111

αιamblichus@aiamblichus

@eliebakouch The v4 DeepSeek models are truly excellent and unique, and the fact that people are not using them more despite their incredible price point is largely a result of a coordinated psyop by Big Token

1h30

Elvis Nava@elvisnavah

@eliebakouch They are cooking me in the qrts

1h26

Prime Intellect's kalomaze argues DeepSeek-R1 validates outcome rewards and policy gradients over Monte Carlo Tree Search

Story Overview

Distillation claims still circulate

RL priorities may simplify