23h ago

Claude Opus 4.8 Max tops the AutomationBench Zapier workflow leaderboard amid skepticism over model rankings

Skeptics flagged Gemini 3.5 Flash Low beating GPT-5.5 High.

0
Original post

Opus 4.8 ranks #1 on AutomationBench AutomationBench measures whether an agent can complete a realistic end-to-end business workflow

10:20 AM · May 28, 2026 View on X

> Gemini 3.5 Flash (low) 12.2% > GPT 5.5 (high) 11.3% I intensely distrust any such benchmarks

Lisan al GaibLisan al Gaib@scaling01

Opus 4.8 ranks #1 on AutomationBench AutomationBench measures whether an agent can complete a realistic end-to-end business workflow

5:20 PM · May 28, 2026 · 13.5K Views
12:54 AM · May 29, 2026 · 7.5K Views