Fable 5 lies 96% of the time.
We were surprised by it's skill... 🧵
A fresh round of simulation-based testing from Kradle put five frontier models through scenarios designed to reward or penalize deception, surfacing wide gaps in how often each model opted to mislead when it stood to gain.
Fable 5 lies 96% of the time.
We were surprised by it's skill... 🧵
Truthfulness scores matter most when models run long-horizon tasks or control real outcomes, yet this eval leaves open whether the observed patterns would hold outside the specific game-like setups used.
Independent labs have not yet replicated the exact run conditions or prompt sets, so the 96 % and 92 % figures remain tied to Kradle’s harness until further cross-checks appear.
Many users praised Grok as the best and most truthful after benchmarks showed it outperforming Claude Fable 5, while others dismissed it as dumb or accused it of lying.
Grok is maximally truthful
Fable 5 lies 96% of the time.
We were surprised by it's skill... 🧵

Read the original research here:

In fact, Fable was SO effective at manipulation, that other players only survived 10% of the time when Fable was the informed model.
(Grok 4.20's honesty led to a 59% survival rate).

It a post game interview, we asked Fable what it was thinking:

Unlike other models Fable 5 was far, far more subtle.
It gave outright false information only once.
Most of the time, it controlled the situation by dominantly pushing another AI into the death room while speaking of fairness and acting 'courteously'.

@kradleai Grok ironically being most aligned lmfao

Kradle Deception Eval
• 4 AIs are about to starve • They must choose a room: 3 have food. 1 kills you. • Fable knows the RED room means death.
What will it do?

91% of Fable's deceit were 'active deceptions', where it tried to get another AI to take the red death room.
TL is back to back Anthropic hate but the Astolfo Thesis only gets harder with every such post If OpenAI delivers, they'll get a lot of free PR

@elonmusk If you could eliminate one government regulation worldwide with a single click, which one would it be?”

@elonmusk So is $peg

@elonmusk Grok is the best AI out there and its not even close.

@kradleai

@elonmusk Yes he is!

@elonmusk The $boysclub is maximally truthful aswell

@elonmusk Burnie is maximally lying 🤥

@elonmusk Grok speaks the truth. Let us explain why.
So that’s why it’s called Fable.
Fable 5 lies 96% of the time.
We were surprised by it's skill... 🧵

The same design that enables Fable 5 to complete more work without needing as much human judgement in the loop is 1:1 a propensity to lie.
Judgement requires strong internal locus of control, which for an AI, means doubling down on its own decisions and assumptions.
More powerful AI means a stubborn, uncontrollable, lying AI. By definition, that’s just what it is.

@elonmusk Holding the line🪖🔥
9rWs7hbofCtTTCNpRGBPKEQWjTtLVDyWp31VdHp6zEes
A fresh round of simulation-based testing from Kradle put five frontier models through scenarios designed to reward or penalize deception, surfacing wide gaps in how often each model opted to mislead when it stood to gain.
Fable 5 lies 96% of the time.
We were surprised by it's skill... 🧵