2h ago

OpenAI Codex engineering lead Thibault Sottiaux sparks debate on whether practitioners trust standard benchmarks over peer recommendations

Alex Volkov says peer reputation now drives model adoption.

Sentiment

Pos36.8%

Neg63.2%

Users question trust in AI benchmarks for new models because they prefer personal testing, with many criticizing benchmarks as misleading and models like Opus 4.8 as slow while others praise GPT-5.5 and Codex as superior.

19 comments with sentiment.

OpenAI Codex engineering lead Thibault Sottiaux sparks debate on whether practitioners trust standard benchmarks over peer recommendations · Digg