/Tech11d ago

AI21Labs Reverses Agent Pipeline Order to Hit SOTA on Issue Resolution

--0--

#457

Original post

Yonatan Belinkov#457

AI21 Labs@AI21Labs

1/5 Our latest Labs in Front piece: Agent pipeline order matters. By reversing a common agent recipe - scale first, enrich second - we reached SOTA on a Dec ‘25 to Mar ‘26 slice (123 issues): 60.9%.

6:45 AM · Jun 4, 2026 · 693 Views

Sentiment

Users praise AI21 Labs for reversing its agent pipeline as it enables effective scale-first enrichment rather than blind search on SWE-rebench.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

AI21 Labs@AI21Labs

2/5 Started with a baseline: classic ReAct agent (GPT-5.2), single Docker-terminal tool. Baselines on the slice: vanilla 53.8%, enrich-only 55.6%, scale-only (n=5 + LLM judge) 55.4%, enrich-then-scale 57.7%.

11d1311

LIKES3

AI21 Labs@AI21Labs

4/5 Still came in ~$0.30 under Claude Code’s spend at a similar score. So we added a lightweight Test Agent that writes repo tests and filters failing patches, pushing our final result to 60.9% - surpassing Claude Code (60.9% vs 56.2%) at the same cost.

11d1093

REPLIES1

AI21 Labs@AI21Labs

5/5 Takeaway: pipeline order is a hyperparameter. If you're already paying for parallel rollouts, reuse them - they're relevant context, not just candidate answers. Full write-up: [https://www.ai21.com/blog/first-scale-then-enrich-how-the-right-execution-strategy-helped-us-reach-state-of-the-art-on-swe-rebench/?utm_source=org-twitter]

11d81

AI21 Labs@AI21Labs

3/5 Reversing (scale-then-enrich) pushed our score to 59.7%. Why? Enrich-first searches a big repo blind from the raw issue. Scale-first hands the extractor your N rollouts - aka a contextual map of where fixes were attempted - so it targets high-probability files.

11d901

Alex YGift@Radipdegen

@AI21Labs scale first enriches pipeline noise second makes the bar go brr

11d23

Rugbist@rugbist_

@AI21Labs small detail but reversing the recipe shifted the whole ceiling

scale first really lets enrichment do its thing

11d23

Zayd AS@NatureAI_2023

@AI21Labs I have similar thoughts: order probably matters more than people assume in these agent pipelines. Curious whether this holds outside the Dec-Mar slice too.

11d3

Golden Hippie@gamestoneai

@AI21Labs Scale-first gives enrichment a map instead of a blind search. Wonder how many production systems still do it backwards.

11d2