We could have this at home, American friends (a government agency that is staffed to do, and allowed to publish, stuff Noam thinks is excellent)
Excellent work from @AISecurityInst investigating the impact of test-time compute budgets for frontier AI model evaluations. They make the case even more convincingly than I could!


