1d agoClaude Opus 4.8 Max takes first on AutomationBench with 15.5%, but critics dispute the model hierarchyGemini 3.5 Flash (Low) unexpectedly outscored GPT-5.5 (High).SentimentSentimentPos0%Neg100%Many users dismissed Claude Opus 4.8 topping the AutomationBench leaderboard as unreliable or mistaken, citing insufficient error margins and possible benchmark flaws.4 comments with sentiment. View comments.