/AI4h ago

Anthropic's Andy Jones shares anecdote of AI system Fable predicting its own 29 percent benchmark score, prompting evaluation jokes

Sholto Douglas joked researchers should just ask models for scores.

20598224663K

#33

Original post

andy jones@andy_l_jones#456inAI

did you know? you can just ask fable what its benchmark score will be

10:12 AM · Jun 9, 2026 · 36.2K Views

/AI4h ago

Anthropic's Andy Jones shares anecdote of AI system Fable predicting its own 29 percent benchmark score, prompting evaluation jokes

Sholto Douglas joked researchers should just ask models for scores.

20598224663K

#33

Original post

andy jones@andy_l_jones#456inAI

did you know? you can just ask fable what its benchmark score will be

10:12 AM · Jun 9, 2026 · 36.2K Views

Sentiment

Users appreciate researchers asking Claude to predict benchmark scores instead of running evaluations because it is cheaper and faster than using GPUs for results that few trust anyway.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS26.2KBOOKMARKS28LIKES340RETWEETS6REPLIES12

Sholto Douglas@_sholtodouglas

we don’t even run evals anymore we just ask Claude what the score will be

andy jones@andy_l_jones

did you know? you can just ask fable what its benchmark score will be

3h26.2K34028

Jack Clark@jackclarkSF

@andy_l_jones lmao

andy jones@andy_l_jones

did you know? you can just ask fable what its benchmark score will be

4h3K342

andy jones@andy_l_jones

(past performance is not an indicator of future returns. generalization not guaranteed)

4h4224

Aaron Slodov@aphysicist

@_sholtodouglas plsssss turn down safety nerfs 👉👈

2h3394

Justin Halford@Justin_Halford_

@_sholtodouglas Imagine being able to know how much compute is needed to meaningfully solve <problem>. What would the long term compute capex appetite be if this was the case?

3h158