/AI11h ago

Mini-SWE-Agent Outperforms Native Harnesses For Claude, GPT, And Gemini

--0--
Original post
Sergey Karayev@sergeykarayev#1144inAI

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

3:54 PM · Jun 4, 2026 · 4.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most Activity
VIEWS738BOOKMARKS2LIKES2
Sergey Karayev@sergeykarayev

From https://deepswe.datacurve.ai/blog#methodology

Sergey Karayev@sergeykarayev

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

11hViews 738Likes 2Bookmarks 2
Mini-SWE-Agent Outperforms Native Harnesses For Claude, GPT, And Gemini · Digg