Are we seeing in real time the story change from 'they distilled us to we accidentally shared the formula'?
If it were this straight forward, it would not take Ant/Gemini 5-6 months to come up with reasoning models.
Also, GRPO (first approach to RLVR in LLMs) was released in Feb 2024 ahead of O1 - though the world did not notice it. https://arxiv.org/pdf/2402.03300
DeepSeek R1 released in Jan 2025 seeded reasoning as per below workflow and then went full RLVR.
Hard to believe, people who invented GRPO, mastered MoE, invented Latent Multi-headed Attention (and later DSA, mHC, CSA, HCA etc.) would not have come up with it from first principals.
I would recommend this book "Where good ideas come from". At any given stage of any given field, multiple people come up with same ideas. History of science is full of thousands of examples of this.
imo it is crazy that openai, years into the heated AGI race, released o1 and described in quite a bit of detail the principles of scaling RL over CoT. I wonder how much value was dispersed to the public that day