/AI11h ago

Mini-SWE-Agent Outperforms Native Harnesses For Claude, GPT, And Gemini

6242195.3K

Original post

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

3:54 PM · Jun 4, 2026 · 4.5K Views

/AI11h ago

Mini-SWE-Agent Outperforms Native Harnesses For Claude, GPT, And Gemini

--0--

#1144

Original post

Sergey Karayev@sergeykarayev#1144inAI

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

3:54 PM · Jun 4, 2026 · 4.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS738BOOKMARKS2LIKES2

Sergey Karayev@sergeykarayev

From https://deepswe.datacurve.ai/blog#methodology

Sergey Karayev@sergeykarayev

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

11h73822

Posts from X

Most Activity

VIEWS738BOOKMARKS2LIKES2

Sergey Karayev@sergeykarayev

From https://deepswe.datacurve.ai/blog#methodology

Sergey Karayev@sergeykarayev

This is the most interesting recent benchmark result that I've seen:

The 100-line mini-swe-agent harness gets better performance out of Opus, GPT, and Gemini than their respective bespoke harnesses.

(As measured on the excellent DeepSWE bench).

Why would that be true?

11h73822