/AI21h ago

Chen Wu trains AI models to verify their own errors, doubling math accuracy and yielding 14x scientific reasoning gains

The technique utilizes a generator-verifier feedback loop.

113374433825.4K
Chen Wu@ChenHenryWu

Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?

We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 🧵1/5

12:52 PM · Jun 5, 2026 · 25.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS39LIKES1REPLIES1
Chen Wu@ChenHenryWu

Next, we ask whether verifier-in-the-loop (ViL) training can improve the generator itself, especially after standard RLVR has saturated.

With test-time verification, we see a 33% pass@1 gain, as expected because the generator learns to use verifier output.

The surprise: even without a verifier at inference, standalone pass@1 improves by 30%. 4/5

21hViews 39Likes 1
RETWEETS32
Chen Wu@ChenHenryWu

Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?

We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 🧵1/5

21hViews 25.5KLikes 339Bookmarks 338
Chen Wu@ChenHenryWu

We want verification to tell not just whether a solution is wrong, but where and why, so self-improvement has a direction. But how should we train for that?

Our key idea: show the model the reference solution so it has more context to teach itself to reason about where the errors are and why. We call this self-trained verification (STV). We then put it in the loop to improve at both test and training time. 2/5

21hViews 32Likes 1
Chen Wu@ChenHenryWu

At test time, the trained verifier makes refinement actually scale. STV roughly doubles pass@1 on hard math, and with enough verification compute, STV-guided 8B even beats a 4× larger model. 3/5

21hViews 29Likes 1
Chen Wu@ChenHenryWu

We see learning to verify as a promising lever for training- and test-time self-improvement, perhaps as important as scaling data and generators.

Check our paper for more results on more domains, weak-to-strong verification, ablations.

📄 paper: http://arxiv.org/abs/2605.30290 💻 code: https://github.com/ar-forum/stv 🌐 site: http://ar-forum.github.io/stv-webpage

This work is done at CMU with @AdtRaghunathan. 5/5

21hViews 15