in vision, Claude Fable is on par with an *old 3B active Qwen* (Qwen-Flash is basically just hosted Qwen3.5-35B-A3B) that's all you get as a spillover from general scale
Claude Fable-5 scores 20.0 on Eyebench-V3 vision benchmark, tying Qwen3.5-Flash and barely beating Claude Opus 4.7
Kalomaze blamed the results on a frozen vision encoder
Users are frustrated with Claude Fable-5 tying a competitor on the Eyebench-V3 vision benchmark, complaining that Anthropic is not trying hard enough on vision while preferring other models.
Most Activity
yeah just one benchmark, I'm exaggerating but this is directionally true. They're not even trying
in vision, Claude Fable is on par with an *old 3B active Qwen* (Qwen-Flash is basically just hosted Qwen3.5-35B-A3B) that's all you get as a spillover from general scale
@teortaxesTex God is it a frozen vision encoder or something GOD why is Google mogging them so hard on this
in vision, Claude Fable is on par with an *old 3B active Qwen* (Qwen-Flash is basically just hosted Qwen3.5-35B-A3B) that's all you get as a spillover from general scale
@teortaxesTex i trust gemma4 26b more for vision than sonnets
@teortaxesTex God is it a frozen vision encoder or something GOD why is Google mogging them so hard on this

@teortaxesTex It is better in spatial reasoning. Vision is not their main property as of now, but I think this benchmark is heavily concentrated on a very narrow AI blindspot, which is more of a vision encoder benchmark than the model.