Opus 4.8 improves DeepSWE benchmark performance by 6% over Opus 4.7 while lowering task costs
OpenAI's GPT 5.4 continues to outperform the new model.
Opus 4.8 is a solid jump over Opus 4.7 on DeepSWE, while also lowering the average cost per task.
However, GPT-5.5 xhigh still beats it by a pretty clear margin while being cheaper.
OpenAI has been cooking insanely hard with its models lately. Really excited to see what GPT-5.6 brings.
That said, I have to admit: I’m starting to really like Opus 4.8 as well.
We’ve entered a moment where both frontier labs keep shipping genuinely impressive models.
Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.