Internal OpenAI benchmark shows model performance on engineering bottlenecks has stagnated, with GPT-5.5 scoring just 1.7%
GPT-5.2 Codex set the benchmark's peak score of 8.33%.
——0——
QUOTE POST
#1674xlr8harder@XLR8HARDER
The real explanation is that OpenAI is pursuing a sophon block strategy with its publicly released models.
OpenAI has this interesting benchmark of OpenAI's real engineering bottlenecks, where the scores have not moved since launch over a year ago. Some earlier models did even better than 5.5. I wonder what's going on here.
3:55 PM · May 30, 2026 · 71.8K Views
5:32 PM · May 30, 2026 · 25K Views