16h ago

Internal OpenAI benchmark shows model performance on engineering bottlenecks has stagnated, with GPT-5.5 scoring just 1.7%

GPT-5.2 Codex set the benchmark's peak score of 8.33%.

215833019195.6K

——0——

Original post

#505@SEBKRIEROP

Peter Gostev@PETERGOSTEV

OpenAI has this interesting benchmark of OpenAI's real engineering bottlenecks, where the scores have not moved since launch over a year ago. Some earlier models did even better than 5.5. I wonder what's going on here.

8:55 AM · May 30, 2026

QUOTE POST

#1674xlr8harder@XLR8HARDER

The real explanation is that OpenAI is pursuing a sophon block strategy with its publicly released models.

Peter Gostev@petergostev

3:55 PM · May 30, 2026 · 71.8K Views

5:32 PM · May 30, 2026 · 25K Views

Internal OpenAI benchmark shows model performance on engineering bottlenecks has stagnated, with GPT-5.5 scoring just 1.7%

Cluster engagement

Sentiment