9h ago

Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5's 70% despite reducing average task costs to $12.58

Chris Hayduk notes the model ran twice as slow.

651.4K91239289.0K

——0——

Original post

#475elvis@OMARSAR0

The efficiency frontier! Where do you think GPT-5.6 will land?

1:39 PM · May 30, 2026

QUOTE POST

#475elvis@OMARSAR0

It will also be interesting to track how open models close the gap in the coming months.

Epoch AI@EpochAIResearch

We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.

8:01 PM · May 29, 2026 · 264.3K Views

9:03 PM · May 30, 2026 · 3.4K Views

Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5's 70% despite reducing average task costs to $12.58

Sentiment

Cluster engagement