Anthropic's Opus 4.8 yields a 6% gain on DeepSWE benchmarks but still trails OpenAI's GPT 5.4
The updated model reduces the average cost per task
Opus 4.8 is a solid jump over Opus 4.7 on DeepSWE, while also lowering the average cost per task.
However, GPT-5.5 xhigh still beats it by a pretty clear margin while being cheaper.
OpenAI has been cooking insanely hard with its models lately. Really excited to see what GPT-5.6 brings.
That said, I have to admit: I’m starting to really like Opus 4.8 as well.
We’ve entered a moment where both frontier labs keep shipping genuinely impressive models.
Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.