Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?
We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 🧵1/5
