/Tech4h ago

UK AI Safety Institute finds standard capability scores obscure performance variations from test-time compute budgets

Miles Brundage urged a comparable US evaluation agency

4502114.1K

#57

Original post

Miles Brundage@Miles_Brundage#57inTech

We could have this at home, American friends (a government agency that is staffed to do, and allowed to publish, stuff Noam thinks is excellent)

Noam Brown@polynoamial

Excellent work from @AISecurityInst investigating the impact of test-time compute budgets for frontier AI model evaluations. They make the case even more convincingly than I could!

10:33 PM · Jul 2, 2026 · 4.1K Views

Sentiment

Some users praised the proposal for a US AI evaluation agency with direct enthusiasm, while others sarcastically mocked the idea by referencing economic struggles or suggesting it would embarrass the government.

Pos

50.0%

Neg

50.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS185LIKES3REPLIES1

Seán Ó hÉigeartaigh@S_OhEigeartaigh

@Miles_Brundage Your British friends understand the need to support you, with your struggling economy and all. Throe you a few pennies. Special relationship, innit.

Miles Brundage@Miles_Brundage

We could have this at home, American friends (a government agency that is staffed to do, and allowed to publish, stuff Noam thinks is excellent)

2h18530

Seán Ó hÉigeartaigh@S_OhEigeartaigh

@Miles_Brundage Going to put a big gold lion statue on the UK AISI building. Operation Embarrass USG into having a Properly Funded AISI.

1h401

Violeta Insights@violetainsights

@Miles_Brundage The "allowed to publish" part is doing a lot of work here

4h26

aliama@aliama

@Miles_Brundage perfetto!

3h18