Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5's 70% despite reducing average task costs to $12.58
Chris Hayduk notes the model ran twice as slow.
——0——
It will also be interesting to track how open models close the gap in the coming months.
We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.
8:01 PM · May 29, 2026 · 264.3K Views
9:03 PM · May 30, 2026 · 3.4K Views