/AI10h ago

AI21Labs Reverses Agent Pipeline Order to Hit SOTA on Issue Resolution

4620519

#554

Original post

Yonatan Belinkov#554

AI21 Labs@AI21Labs

1/5 Our latest Labs in Front piece: Agent pipeline order matters. By reversing a common agent recipe - scale first, enrich second - we reached SOTA on a Dec ‘25 to Mar ‘26 slice (123 issues): 60.9%.

6:45 AM · Jun 4, 2026 · 519 Views

/AI10h ago

AI21Labs Reverses Agent Pipeline Order to Hit SOTA on Issue Resolution

--0--

#554

Original post

Yonatan Belinkov#554

AI21 Labs@AI21Labs

6:45 AM · Jun 4, 2026 · 519 Views

Sentiment

Users praise AI21 Labs for reversing its agent pipeline as it enables effective scale-first enrichment rather than blind search on SWE-rebench.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

AI21 Labs@AI21Labs

2/5 Started with a baseline: classic ReAct agent (GPT-5.2), single Docker-terminal tool. Baselines on the slice: vanilla 53.8%, enrich-only 55.6%, scale-only (n=5 + LLM judge) 55.4%, enrich-then-scale 57.7%.

10h1311

LIKES3

AI21 Labs@AI21Labs

4/5 Still came in ~$0.30 under Claude Code’s spend at a similar score. So we added a lightweight Test Agent that writes repo tests and filters failing patches, pushing our final result to 60.9% - surpassing Claude Code (60.9% vs 56.2%) at the same cost.

10h1093

REPLIES1

AI21 Labs@AI21Labs

5/5 Takeaway: pipeline order is a hyperparameter. If you're already paying for parallel rollouts, reuse them - they're relevant context, not just candidate answers. Full write-up: [https://www.ai21.com/blog/first-scale-then-enrich-how-the-right-execution-strategy-helped-us-reach-state-of-the-art-on-swe-rebench/?utm_source=org-twitter]

10h81

Posts from X

Most Activity

No ranked X posts are available for this story yet.