1h ago

Lisan al Gaib says Claude Opus 4.8 at low effort nearly matches Claude Opus 4.6 high-effort SWE-Bench Pro performance

Extra-high effort yields a 70% pass rate.

0
Original post

we might have a GPT-5.2-xhigh situation on our hand Opus 4.8 low thinks almost as much as Opus 4.6 high

10:08 AM · May 28, 2026 View on X

@scaling01 the default is iso-compute to 4.7 for coding tasks :pray:

Lisan al GaibLisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand Opus 4.8 low thinks almost as much as Opus 4.6 high

5:08 PM · May 28, 2026 · 32.1K Views
5:59 PM · May 28, 2026 · 813 Views

okay might just be the benchmark

Lisan al GaibLisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand Opus 4.8 low thinks almost as much as Opus 4.6 high

5:08 PM · May 28, 2026 · 32.1K Views
5:16 PM · May 28, 2026 · 4.1K Views

this looks much better

Lisan al GaibLisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand Opus 4.8 low thinks almost as much as Opus 4.6 high

5:08 PM · May 28, 2026 · 32.1K Views
5:14 PM · May 28, 2026 · 8.2K Views

PB is seemingly close to being solved, so it was in fact an elicitation (and money) issue

Sadly they don't specify the harness for PB in the system card, while they do for some other benches

Lisan al GaibLisan al Gaib@scaling01

this looks much better

5:14 PM · May 28, 2026 · 8.2K Views
5:22 PM · May 28, 2026 · 1.7K Views
Lisan al Gaib says Claude Opus 4.8 at low effort nearly matches Claude Opus 4.6 high-effort SWE-Bench Pro performance · Digg