You really need your own benchmarks. If you are translating hieroglyphics, use Gemini 3.5 Flash. If you are running a vending machine use Opus 4.8.
(This is one reason why I am skeptical of just swapping out models to optimize costs or generic benchmarks without testing first)
Fable 5 is a large step for Anthropic's vision capabilities and effectively ties with GPT-5.5 on HieroglyphBench, my benchmark which tests how well VLMs can transcribe ancient Egyptian hieroglyphs
However, they're both still far behind the Gemini series, where 3.5 Flash has more than double the score









