馃毃 Added Fable to BenchBench, it's now the second best behind GPT 5.2. Its benchmark was "Rosetta Fieldwork", a procedurally generated conlang to translate a novel English sentence. Validated cleanly, gold control clean.
It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".
GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!
http://x.com/i/article/2058941883498553344