23h ago

Claude Opus 4.8 Max tops the AutomationBench Zapier workflow leaderboard amid skepticism over model rankings

Skeptics flagged Gemini 3.5 Flash Low beating GPT-5.5 High.

1125792120.8K

——0——

Original post

Opus 4.8 ranks #1 on AutomationBench AutomationBench measures whether an agent can complete a realistic end-to-end business workflow

QUOTE POST

> Gemini 3.5 Flash (low) 12.2% > GPT 5.5 (high) 11.3% I intensely distrust any such benchmarks

Lisan al Gaib@scaling01

Opus 4.8 ranks #1 on AutomationBench AutomationBench measures whether an agent can complete a realistic end-to-end business workflow

5:20 PM · May 28, 2026 · 13.5K Views

12:54 AM · May 29, 2026 · 7.5K Views

Sentiment