4h ago

Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5 but demonstrating lower task costs

Runway’s founder says DeepSWE matches real-world coding impressions.

24445176845.0K

——0——

Original post

#475elvis@OMARSAR0

The efficiency frontier! Where do you think GPT-5.6 will land?

1:39 PM · May 30, 2026

QUOTE POST

#475elvis@OMARSAR0

It will also be interesting to track how open models close the gap in the coming months.

Epoch AI@EpochAIResearch

We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.

8:01 PM · May 29, 2026 · 203.9K Views

9:03 PM · May 30, 2026 · 2.2K Views

Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5 but demonstrating lower task costs

Sentiment

Cluster engagement