/AI2h ago

Nikhil Chandak finds Claude Fable 5 and GPT-5.5 fail to dynamically update forecasts, maintaining flat 20% FutureSim accuracy

The evaluation used post-cutoff data to prevent knowledge contamination.

127432.6K
Nikhil Chandak@nikhilchandak29

🚨 Claude Fable 5 on FutureSim 🚨

While Anthropic folks have been using it internally for predicting evals, we actually put it to test on FutureSim.

We found it has very strong priors but fails to update predictions over time and ends up no better than GPT-5.5!

We report our results on Feb-March subset to minimize contamination with Fable's knowledge cutoff of Jan.

Sholto Douglas@_sholtodouglas

we don’t even run evals anymore we just ask Claude what the score will be

4:51 AM · Jun 10, 2026 · 5.6K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS406LIKES5

it's impressive how quickly @nikhilchandak29 added the Fable 5 results to our FutureSim benchmark!

Fable 5 is at 20% accuracy - and y'all are saying we don't have unsaturated evals? :-)

Nikhil Chandak@nikhilchandak29

🚨 Claude Fable 5 on FutureSim 🚨

While Anthropic folks have been using it internally for predicting evals, we actually put it to test on FutureSim.

We found it has very strong priors but fails to update predictions over time and ends up no better than GPT-5.5!

We report our results on Feb-March subset to minimize contamination with Fable's knowledge cutoff of Jan.

1hViews 406Likes 5Bookmarks 0
RETWEETS3
Nikhil Chandak@nikhilchandak29

🚨 Claude Fable 5 on FutureSim 🚨

While Anthropic folks have been using it internally for predicting evals, we actually put it to test on FutureSim.

We found it has very strong priors but fails to update predictions over time and ends up no better than GPT-5.5!

We report our results on Feb-March subset to minimize contamination with Fable's knowledge cutoff of Jan.

Sholto Douglas@_sholtodouglas

we don’t even run evals anymore we just ask Claude what the score will be

2hViews 5.6KLikes 37Bookmarks 5