r1 was THE platonic validation of outcome rewards + pg being everything you principally needed for a generative model to bootstrap towards capabilities for which there's no existing data distribution there was a huge psyop at the time focused around MCTS (and search more broadly)
Looking back, I never understood the hype of the "deepseek moment" from last year. Distilling others' models is possible and easier than pushing the frontier. Like, nobody remembers Alpaca?







