14h ago

DeepSWE benchmark results show GPT-5.5 beats Claude Opus 4.8 on coding tasks, scoring 70% at half the cost

GPT-5.5 completed tasks twice as fast using one-third the tokens.

0
Original post

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

9:59 AM · May 30, 2026 View on X
Reposted by

Nostra culpa for losing the cool-vibe to Claude but if you actually care about quality (or cost!) come try 5.5

Gabriel ChuaGabriel Chua@gabrielchua

GPT-5.5 going strong on DeepSWE For performance vs cost/time/output tokens

7:32 PM · May 30, 2026 · 20.4K Views
9:30 PM · May 30, 2026 · 6.7K Views

Trying to maximize spend or actually getting your work done?

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 48.3K Views
6:32 PM · May 30, 2026 · 14.8K Views

sauce: https://deepswe.datacurve.ai/

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 48.3K Views
4:59 PM · May 30, 2026 · 2.6K Views

*and cost mogged too

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE

4:59 PM · May 30, 2026 · 48.3K Views
5:00 PM · May 30, 2026 · 3K Views

@reach_vb Kudos guys. 5.5 is a great model

Vaibhav (VB) SrivastavVaibhav (VB) Srivastav@reach_vb

Best part: GPT-5.5 does all of this while being ~3x more token efficient than Opus 4.8. 47k output tokens vs 136k. Oh, it's also cheaper and faster: $6.61/task vs $12.58, 21 min vs 43 min. Enjoy!

11:38 PM · May 30, 2026 · 8.3K Views
2:34 AM · May 31, 2026 · 631 Views

GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥

70% pass@1 vs 58% for Claude Opus 4.8.

And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens

Literally, better intelligence per dollar, per minute, per task.

11:26 PM · May 30, 2026 · 23.8K Views

Best part: GPT-5.5 does all of this while being ~3x more token efficient than Opus 4.8.

47k output tokens vs 136k.

Oh, it's also cheaper and faster: $6.61/task vs $12.58, 21 min vs 43 min.

Enjoy!

Vaibhav (VB) SrivastavVaibhav (VB) Srivastav@reach_vb

GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence per dollar, per minute, per task.

11:26 PM · May 30, 2026 · 23.8K Views
11:38 PM · May 30, 2026 · 8.3K Views
DeepSWE benchmark results show GPT-5.5 beats Claude Opus 4.8 on coding tasks, scoring 70% at half the cost · Digg