1/5 Our latest Labs in Front piece: Agent pipeline order matters. By reversing a common agent recipe - scale first, enrich second - we reached SOTA on a Dec ‘25 to Mar ‘26 slice (123 issues): 60.9%.
1/5 Our latest Labs in Front piece: Agent pipeline order matters. By reversing a common agent recipe - scale first, enrich second - we reached SOTA on a Dec ‘25 to Mar ‘26 slice (123 issues): 60.9%.
1/5 Our latest Labs in Front piece: Agent pipeline order matters. By reversing a common agent recipe - scale first, enrich second - we reached SOTA on a Dec ‘25 to Mar ‘26 slice (123 issues): 60.9%.
Users praise AI21 Labs for reversing its agent pipeline as it enables effective scale-first enrichment rather than blind search on SWE-rebench.

2/5 Started with a baseline: classic ReAct agent (GPT-5.2), single Docker-terminal tool. Baselines on the slice: vanilla 53.8%, enrich-only 55.6%, scale-only (n=5 + LLM judge) 55.4%, enrich-then-scale 57.7%.

4/5 Still came in ~$0.30 under Claude Code’s spend at a similar score. So we added a lightweight Test Agent that writes repo tests and filters failing patches, pushing our final result to 60.9% - surpassing Claude Code (60.9% vs 56.2%) at the same cost.

5/5 Takeaway: pipeline order is a hyperparameter. If you're already paying for parallel rollouts, reuse them - they're relevant context, not just candidate answers. Full write-up: [https://www.ai21.com/blog/first-scale-then-enrich-how-the-right-execution-strategy-helped-us-reach-state-of-the-art-on-swe-rebench/?utm_source=org-twitter]