Mark Chen says GPT-5.5 outperforms Claude Opus 4.8 by 20.7% at half the cost and twice the speed
GPT-5.5 succeeded on 12 additional tasks during testing.
Most Activity
The big story here is that GPT 5.5 (high/xhigh) outperforms claude-opus-4.8 (max/xhigh) by 20.7% succeeding on 12 additional tasks!
More impressive: GPT is roughly half the cost and twice as fast.
OpenAI is back in the game. Overall, this competition is healthy for the industry. I'd love to see a third player rise to the top of the leaderboard!
Opus 4.8 is now on DeepSWE.
On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.