DeepSWE benchmark data shows GPT-5.5 outperforms Claude Opus 4.8 on software engineering tasks and token efficiency
Claude Opus 4.8 cost $12 per task.
——0——
Trying to maximize spend or actually getting your work done?
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
4:59 PM · May 30, 2026 · 18.6K Views
6:32 PM · May 30, 2026 · 6.7K Views
sauce: https://deepswe.datacurve.ai/
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
4:59 PM · May 30, 2026 · 18.6K Views
4:59 PM · May 30, 2026 · 1.3K Views
*and cost mogged too
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
4:59 PM · May 30, 2026 · 18.6K Views
5:00 PM · May 30, 2026 · 1.7K Views