3h ago

DeepSWE benchmark data shows GPT-5.5 outperforms Claude Opus 4.8 on software engineering tasks and token efficiency

Claude Opus 4.8 cost $12 per task.

0
Original post

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

9:59 AM · May 30, 2026 View on X

Trying to maximize spend or actually getting your work done?

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 18.6K Views
6:32 PM · May 30, 2026 · 6.7K Views

sauce: https://deepswe.datacurve.ai/

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 18.6K Views
4:59 PM · May 30, 2026 · 1.3K Views

*and cost mogged too

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 18.6K Views
5:00 PM · May 30, 2026 · 1.7K Views