did you know? you can just ask fable what its benchmark score will be
Anthropic's Andy Jones shares anecdote of AI system Fable predicting its own 29 percent benchmark score, prompting evaluation jokes
Sholto Douglas joked researchers should just ask models for scores.
Users appreciate researchers asking Claude to predict benchmark scores instead of running evaluations because it is cheaper and faster than using GPUs for results that few trust anyway.
Most Activity
we don’t even run evals anymore we just ask Claude what the score will be
did you know? you can just ask fable what its benchmark score will be
@andy_l_jones lmao
did you know? you can just ask fable what its benchmark score will be

(past performance is not an indicator of future returns. generalization not guaranteed)

@_sholtodouglas plsssss turn down safety nerfs 👉👈

@_sholtodouglas Imagine being able to know how much compute is needed to meaningfully solve <problem>. What would the long term compute capex appetite be if this was the case?

@_sholtodouglas im confused - help me sholto!

@_sholtodouglas

@_sholtodouglas @rickasaurus Fable still only scoring slightly better on my private benchmark btw.

@_sholtodouglas lmao trust the oracle approach
benchmarks: we do them, we just skip the waiting

@andy_l_jones Predictive processing at its finest.

@_sholtodouglas bro saw the future and decided to skip the test

@_sholtodouglas Simple regression ez

@_sholtodouglas honestly this is cheaper and faster, why run 8 GPUs for a number nobody trusts anyway