/AI3h ago

Fable Benchmark Demonstrates Strong Calibration for AI Self-Assessment

112024.3K

Original post

rohit@krishnanrohit#1214inAI

🚨 Fable benchmark. Tried to update MarketBench too, with Fable cc @AndreyFradkin . Fable is very good (much better calibrated) on judging its own capabilities - mean stated confidence 0.85 against a realized 87% pass rate, Brier 0.117, and its rare low-confidence calls landed on genuine traps.

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

3:19 AM · Jun 10, 2026 · 2.3K Views

/AI3h ago

Fable Benchmark Demonstrates Strong Calibration for AI Self-Assessment

112024.3K

#1214

Original post

rohit@krishnanrohit#1214inAI

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

3:19 AM · Jun 10, 2026 · 2.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.9KBOOKMARKS1

rohit@krishnanrohit

It bid a flat 0.88 - 0.93 on everything because it said stuff about remembered gold patches. I didn't push it more because contamination makes it hard!

rohit@krishnanrohit

There's probably some leakage here though of the questions, the model seemed to *know* what to answer reading its writing, but the level of contagion is hard to parse.

3h1.9K01