/AI21h ago

Chen Wu trains AI models to verify their own errors, doubling math accuracy and yielding 14x scientific reasoning gains

The technique utilizes a generator-verifier feedback loop.

113374433825.4K

#1112

Original post

Aditi Raghunathan#1112

Chen Wu@ChenHenryWu

Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?

We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 🧵1/5

12:52 PM · Jun 5, 2026 · 25.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS39LIKES1REPLIES1

Chen Wu@ChenHenryWu

Next, we ask whether verifier-in-the-loop (ViL) training can improve the generator itself, especially after standard RLVR has saturated.

With test-time verification, we see a 33% pass@1 gain, as expected because the generator learns to use verifier output.

The surprise: even without a verifier at inference, standalone pass@1 improves by 30%. 4/5

21h391

RETWEETS32

Chen Wu@ChenHenryWu

Self-improvement depends on whether a model can judge its own work. We usually train models to generate better - why not train them to verify just as well?

We show how to train models to pinpoint their errors, and the same model nearly doubles its accuracy on hard math and jumps 14x on scientific reasoning. 🧵1/5

21h25.5K339338

Chen Wu@ChenHenryWu

We want verification to tell not just whether a solution is wrong, but where and why, so self-improvement has a direction. But how should we train for that?

Our key idea: show the model the reference solution so it has more context to teach itself to reason about where the errors are and why. We call this self-trained verification (STV). We then put it in the loop to improve at both test and training time. 2/5

21h321

Chen Wu@ChenHenryWu

At test time, the trained verifier makes refinement actually scale. STV roughly doubles pass@1 on hard math, and with enough verification compute, STV-guided 8B even beats a 4× larger model. 3/5

21h291

Chen Wu@ChenHenryWu

We see learning to verify as a promising lever for training- and test-time self-improvement, perhaps as important as scaling data and generators.

Check our paper for more results on more domains, weak-to-strong verification, ablations.

📄 paper: http://arxiv.org/abs/2605.30290 💻 code: https://github.com/ar-forum/stv 🌐 site: http://ar-forum.github.io/stv-webpage

This work is done at CMU with @AdtRaghunathan. 5/5

21h15