8h ago

Opus 4.8 improves DeepSWE benchmark performance by 6% over Opus 4.7 while lowering task costs

OpenAI's GPT 5.4 continues to outperform the new model.

0
Original post

Good results! Lines up with my experience

2:34 PM · May 30, 2026 View on X
Reposted by

Opus 4.8 is a solid jump over Opus 4.7 on DeepSWE, while also lowering the average cost per task.

However, GPT-5.5 xhigh still beats it by a pretty clear margin while being cheaper.

OpenAI has been cooking insanely hard with its models lately. Really excited to see what GPT-5.6 brings.

That said, I have to admit: I’m starting to really like Opus 4.8 as well.

We’ve entered a moment where both frontier labs keep shipping genuinely impressive models.

DatacurveDatacurve@datacurve

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

9:21 PM · May 30, 2026 · 280.3K Views
2:11 AM · May 31, 2026 · 18K Views
Opus 4.8 improves DeepSWE benchmark performance by 6% over Opus 4.7 while lowering task costs · Digg