1h ago

Google Gemini 3.5 Flash tops the vals.ai Finance Agent v2 benchmark, beating Claude Opus 4.8

The winning model averaged a cost of $2.51 per test.

0
Original post

didn't know this until today but gemini 3.5 flash is top of the finance-agent-v2 leaderboard by quite a bit

11:13 AM · May 28, 2026 View on X

Another example, different models are better at different things, to get the best requires you to use the best model (and combine results from the best models), which is our primary job going forward!

Gabriel StengelGabriel Stengel@GabeStengel

If you scroll down far enough in the blog post... can see that that Gemini 3.5 flash outperforms Opus 4.8 by a BIG margin on Finance Agent benchmark "* Finance Agent v2: Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro." No one model rules them all! As much as any one lab would like you to believe....

5:40 PM · May 28, 2026 · 6.1K Views
6:34 PM · May 28, 2026 · 1.4K Views
Google Gemini 3.5 Flash tops the vals.ai Finance Agent v2 benchmark, beating Claude Opus 4.8 · Digg