/AI42d ago

CMU's Ruslan Salakhutdinov and co-authors release a framework to scale test-time compute for complex, sequential AI coding tasks

Distilled summaries of past attempts drive iterative code refinement.

42643625133.8K

#39

Original post

Russ Salakhutdinov@rsalakhu#39inAI

New work on Scaling Test-Time Compute for Agentic Coding:

Paper: https://arxiv.org/abs/2604.16529

This work introduces a test-time scaling framework for agentic coding that converts rollouts into structured summaries capturing key hypotheses, progress, and failure modes while discarding low-signal details.

This enables two forms of inference-time scaling: (1) Recursive Tournament Voting (RTV) for parallel selection via iterative small-group comparisons, and (2) Parallel-Distill-Refine (PDR) for sequential improvement by conditioning new rollouts on distilled summaries.

Our approach consistently boosts performance on frontier benchmarks. On SWE-Bench Verified, Claude-4.5-Opus improves from 70.9% → 77.6%, and on Terminal-Bench v2.0 from 46.9% → 59.1%. These gains highlight that effective test-time scaling for long-horizon agents hinges on representation, selection, and reuse, not just sampling more trajectories.

Check out a more detailed thread by @anirudhg9119.

Anirudh Goyal@anirudhg9119

How do coding agents get better from experience?

Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on.

http://arxiv.org/abs/2604.16529

2:11 PM · Apr 25, 2026 · 33.8K Views

/AI42d ago

CMU's Ruslan Salakhutdinov and co-authors release a framework to scale test-time compute for complex, sequential AI coding tasks

Distilled summaries of past attempts drive iterative code refinement.

42643625133.8K

#39

Original post

Russ Salakhutdinov@rsalakhu#39inAI

New work on Scaling Test-Time Compute for Agentic Coding:

Paper: https://arxiv.org/abs/2604.16529

Check out a more detailed thread by @anirudhg9119.

Anirudh Goyal@anirudhg9119

How do coding agents get better from experience?

Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on.

http://arxiv.org/abs/2604.16529

2:11 PM · Apr 25, 2026 · 33.8K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

RETWEETS35

Russ Salakhutdinov@rsalakhu

New work on Scaling Test-Time Compute for Agentic Coding:

Paper: https://arxiv.org/abs/2604.16529

Check out a more detailed thread by @anirudhg9119.

Anirudh Goyal@anirudhg9119

How do coding agents get better from experience?

Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on.

http://arxiv.org/abs/2604.16529

42d33.8K264251

Anirudh Goyal@anirudhg9119

How do coding agents get better from experience?

Past Attempts as Interface: Turn rollouts into reusable summaries that future attempts can build on.

http://arxiv.org/abs/2604.16529

45d2.8K2915