/Tech4h ago

DeepSeek's February 2024 GRPO paper predates OpenAI's o1, suggesting its reasoning models stem from independent research instead of distillation

Western organizations largely overlooked the paper when first published.

111733899.4K

#84

Original post

GDP@bookwormengr

Are we seeing in real time the story change from 'they distilled us to we accidentally shared the formula'?

If it were this straight forward, it would not take Ant/Gemini 5-6 months to come up with reasoning models.

Also, GRPO (first approach to RLVR in LLMs) was released in Feb 2024 ahead of O1 - though the world did not notice it. https://arxiv.org/pdf/2402.03300

DeepSeek R1 released in Jan 2025 seeded reasoning as per below workflow and then went full RLVR.

Hard to believe, people who invented GRPO, mastered MoE, invented Latent Multi-headed Attention (and later DSA, mHC, CSA, HCA etc.) would not have come up with it from first principals.

I would recommend this book "Where good ideas come from". At any given stage of any given field, multiple people come up with same ideas. History of science is full of thousands of examples of this.

roon@tszzl

imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day

5:55 PM · Jun 18, 2026 · 4.1K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement