16h ago

Internal OpenAI benchmark shows model performance on engineering bottlenecks has stagnated, with GPT-5.5 scoring just 1.7%

GPT-5.2 Codex set the benchmark's peak score of 8.33%.

0
Original post

OpenAI has this interesting benchmark of OpenAI's real engineering bottlenecks, where the scores have not moved since launch over a year ago. Some earlier models did even better than 5.5. I wonder what's going on here.

8:55 AM · May 30, 2026 View on X

The real explanation is that OpenAI is pursuing a sophon block strategy with its publicly released models.

Peter GostevPeter Gostev@petergostev

OpenAI has this interesting benchmark of OpenAI's real engineering bottlenecks, where the scores have not moved since launch over a year ago. Some earlier models did even better than 5.5. I wonder what's going on here.

3:55 PM · May 30, 2026 · 71.8K Views
5:32 PM · May 30, 2026 · 25K Views