/AI4h ago

Fable Model Ranks Second on BenchBench Rosetta Fieldwork Benchmark

113102.4K

Original post

rohit@krishnanrohit#1214inAI

🚨 Added Fable to BenchBench, it's now the second best behind GPT 5.2. Its benchmark was "Rosetta Fieldwork", a procedurally generated conlang to translate a novel English sentence. Validated cleanly, gold control clean.

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

rohit@krishnanrohit

http://x.com/i/article/2058941883498553344

3:08 AM · Jun 10, 2026 · 1.9K Views

/AI4h ago

Fable Model Ranks Second on BenchBench Rosetta Fieldwork Benchmark

113102.4K

#1214

Original post

rohit@krishnanrohit#1214inAI

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

rohit@krishnanrohit

http://x.com/i/article/2058941883498553344

3:08 AM · Jun 10, 2026 · 1.9K Views

Sentiment

Users praise Fable's competitive benchmark scores on BenchBench Rosetta Fieldwork and express optimism it could beat GPT 5.2 with another chance.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS481LIKES4

rohit@krishnanrohit

This is the spread of scores, and why GPT 5.2 is still the winner.

I also think that if we give Fable another shot, given this input, it would probably do better. I haven't done so yet, (sorry, intelligence too expensive to meter). It really is a smart model!

rohit@krishnanrohit

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

4h48140