Fable 5 is doing something wild on our FrogsGame post-training task.
It trains a weaker model to solve the puzzle, peaks at 68%, and produces the only ~10x improvement we see across the benchmark.
It spent 17 hours, 25M tokens without human in sight. 34% pass@1, while every other frontier model averages under 4%.
We will publish a more detailed analysis soon.
Model shaping is still a craft of a few. That's what AI agents are for: learning it and doing it for everyone else.
As a part of FrontierSWE benchmark we built a 20-hour post-training task on @tinkerapi and found the real bottleneck is research intuition.












