We've always intuited that verification is easier than generation. Chen's new work shows that explicitly training for it unlocks massive self-improvement:
馃搱 14脳 boost in test-time refinement on hard reasoning 馃殌 30% gain beyond the RL plateau at training time
Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?
We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 馃У1/5



