11h ago

Gemini 3.5 Flash records the highest score of 47.1% Pass@1 on the APEX-Agents-AA benchmark, ahead of GPT-5.5 at 37.7% and Claude Opus 4.6 at 33.0%, according to Artificial Analysis data released May 19, 2026

Separate evaluations show leadership in coding, vision and finance tasks.

0
Original post

Gemini 3.5 Flash ranks #1 on the APEX-Agents-AA benchmark, outperforming much larger models a whole size above it.

6:56 AM · May 21, 2026 View on X

Here are some third-party evals I came across for 3.5 Flash this week. It's doing well across agents, coding, vision, finance.

Try it. Share what worked and what didn't. We will fix it. Every failure case makes the next version better.

9:15 PM · May 22, 2026 · 1.8K Views