new shape-rotator benchmark
Fable and GPT-5.5 of course far ahead of the field
but now look at GLM-5.2. it's ahead of Gemini 3.5 Flash and Opus 4.8
you can't really benchmaxx a benchmark that was just released
so the GLM-5.2 gains seem more and more like a genuine improvement!
Are AI agents shape rotators? In this new benchmark, we let the models play campaign puzzles in Opus Magnum, a puzzle game by @zachtronics.
Ironically, Claude Opus 4.8 performed poorly, being beaten by GPT-5.5, Gemini 3.5 Flash, and GLM 5.2. Claude Fable 5 crushed them all.













