/Tech12h ago

Rohit Krishnan adds Fable AI to BenchBench, placing second behind GPT-5.2 on the Rosetta Fieldwork conlang translation benchmark

Fable avoided the repetitive formatting issues affecting prior models.

424275.5K

Original post

rohit@krishnanrohit#1210inTech

🚨 Added Fable to BenchBench, it's now the second best behind GPT 5.2. Its benchmark was "Rosetta Fieldwork", a procedurally generated conlang to translate a novel English sentence. Validated cleanly, gold control clean.

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

rohit@krishnanrohit

http://x.com/i/article/2058941883498553344

3:08 AM · Jun 10, 2026 · 4.7K Views

/Tech12h ago

Rohit Krishnan adds Fable AI to BenchBench, placing second behind GPT-5.2 on the Rosetta Fieldwork conlang translation benchmark

Fable avoided the repetitive formatting issues affecting prior models.

424275.5K

#1210

Original post

rohit@krishnanrohit#1210inTech

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

rohit@krishnanrohit

http://x.com/i/article/2058941883498553344

3:08 AM · Jun 10, 2026 · 4.7K Views

Sentiment

Users praise Fable's competitive benchmark scores on BenchBench Rosetta Fieldwork and express optimism it could beat GPT 5.2 with another chance.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS597BOOKMARKS1LIKES6

rohit@krishnanrohit

This is the spread of scores, and why GPT 5.2 is still the winner.

I also think that if we give Fable another shot, given this input, it would probably do better. I haven't done so yet, (sorry, intelligence too expensive to meter). It really is a smart model!

rohit@krishnanrohit

It is, unsurprisingly, a very good model! It's the first model to truly pass validation with like a genuinely novel idea, since feedback packet steered most others to "annoying paperwork".

GPT 5.2's genius, the reason it won, was that it had uniform difficulty for all packets. Fable's task turned out to be too easy for Gemini, who are disproportionately gooda t pattern-induction!

12h59761