Claude Opus 4.8 scores 58% on DeepSWE coding benchmark, trailing GPT-5.5 but demonstrating lower task costs
Runway’s founder says DeepSWE matches real-world coding impressions.
——0——
It will also be interesting to track how open models close the gap in the coming months.
We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.
8:01 PM · May 29, 2026 · 203.9K Views
9:03 PM · May 30, 2026 · 2.2K Views