DeepSWE benchmark results show GPT-5.5 beats Claude Opus 4.8 on coding tasks, scoring 70% at half the cost
GPT-5.5 completed tasks twice as fast using one-third the tokens.
Nostra culpa for losing the cool-vibe to Claude but if you actually care about quality (or cost!) come try 5.5
GPT-5.5 going strong on DeepSWE For performance vs cost/time/output tokens
Trying to maximize spend or actually getting your work done?
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
sauce: https://deepswe.datacurve.ai/
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
*and cost mogged too
Opus 4.8 gets score-, time- and token-mogged by GPT-5.5 on DeepSWE
@reach_vb Kudos guys. 5.5 is a great model
Best part: GPT-5.5 does all of this while being ~3x more token efficient than Opus 4.8. 47k output tokens vs 136k. Oh, it's also cheaper and faster: $6.61/task vs $12.58, 21 min vs 43 min. Enjoy!
GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥
70% pass@1 vs 58% for Claude Opus 4.8.
And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens
Literally, better intelligence per dollar, per minute, per task.
Best part: GPT-5.5 does all of this while being ~3x more token efficient than Opus 4.8.
47k output tokens vs 136k.
Oh, it's also cheaper and faster: $6.61/task vs $12.58, 21 min vs 43 min.
Enjoy!
GPT-5.5 is #1 on DeepSWE, a hard long-horizon coding benchmark 🔥 70% pass@1 vs 58% for Claude Opus 4.8. And GPT-5.5 gets there with: ~2x faster runs ~1/2 the cost ~1/3 the output tokens Literally, better intelligence per dollar, per minute, per task.


