16h ago

OpenAI's internal engineering benchmark shows model pass rates have stalled below 10% with GPT-5.5 scoring 1.7%

GPT-5.2 Codex achieved the highest score at 8.33%.

Sentiment

Pos25%

Neg75%

Many users accused OpenAI of sandbagging its models on benchmarks showing flat low scores, while a few praised the bottleneck data or underrated models like 5.2.

7 comments with sentiment.

OpenAI's internal engineering benchmark shows model pass rates have stalled below 10% with GPT-5.5 scoring 1.7% · Digg