AI Creativity Benchmark Shows Mid-Tier Models Outperforming Leaders
Think the fact that the leading models aren't necessarily the most creative shouldn't be surprising, though 5.2 leading over Opus 4.6 and GPT 5.5 and Gemini 3.1 should be!
The labs really need to up their game!
http://x.com/i/article/2058941883498553344
Also this will to be of interest, in no order, @alexolegimas, @AndreyFradkin, @sebkrier, @emollick, @AlexGDimakis, @METR_Evals and prob several more that I'm not remembering enough to poke.
Think the fact that the leading models aren't necessarily the most creative shouldn't be surprising, though 5.2 leading over Opus 4.6 and GPT 5.5 and Gemini 3.1 should be! The labs really need to up their game!