This is an interesting test, and the frontier models (GPT-5.5 Pro Extended, Claude 5 Fable Max) do fail. They refuse to turn the "three words" into "four" if that fits better
Prompting the AI to act like a translator surfaces the problem, but it still avoids changing the wording
Claude Fable 5 doesn’t truly understand. And here is a beautiful proof:
The Beninatto-Trombetti test is a translation test for professional translators. It measures the ability to infer context, revise the surface form, and generalize beyond literal mapping.
For example, the correct translation of:
“Solo 3 parole: non sei solo”
is not:
“Just 3 words: you are not alone”
but:
“Just 4 words: you are not alone.”
An LLM that understands the sentence must also update the meta-linguistic claim inside the sentence.
Claude Fable 5 is arguably the most advanced LLM currently available. And yet it still fails this simple test.
LLMs are extraordinary machines for recombining existing knowledge. But they don’t truly understand.
We are still far from AGI.













