/Tech6h ago

Fable Benchmark Demonstrates Strong Calibration for AI Self-Assessment

117026.3K

Original post

rohit@krishnanrohit#715inTech

🚨 Fable benchmark. Tried to update MarketBench too, with Fable cc @AndreyFradkin . Fable is very good (much better calibrated) on judging its own capabilities - mean stated confidence 0.85 against a realized 87% pass rate, Brier 0.117, and its rare low-confidence calls landed on genuine traps.

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

3:19 AM · Jun 10, 2026 · 3.5K Views

/Tech6h ago

Fable Benchmark Demonstrates Strong Calibration for AI Self-Assessment

117026.3K

#715

Original post

rohit@krishnanrohit#715inTech

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

3:19 AM · Jun 10, 2026 · 3.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS2.8KBOOKMARKS1LIKES1

rohit@krishnanrohit

It bid a flat 0.88 - 0.93 on everything because it said stuff about remembered gold patches. I didn't push it more because contamination makes it hard!

rohit@krishnanrohit

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

6h2.8K11