http://x.com/i/article/2057694226981257216
Noam Brown, OpenAI o1 co-creator, urges benchmark developers to plot LLM performance against test-time compute
Equal token budgets reveal GPT-5.5 outperforms GPT-5.4.
Positive users welcome proposals for compute-normalized LLM benchmarks as a needed upgrade to track real AI progress via test-time compute, while negative users accuse labs of deliberately avoiding such evaluations to obscure model gaps.
No Digg Deeper questions have been answered for this story yet.





