1h ago

Google Gemini 3.5 Flash tops the vals.ai Finance Agent v2 benchmark, beating Claude Opus 4.8

The winning model averaged a cost of $2.51 per test.

2302117.0K

——0——

Original post

#1562Nataniel Ruiz@NATANIELRUIZG

didn't know this until today but gemini 3.5 flash is top of the finance-agent-v2 leaderboard by quite a bit

11:13 AM · May 28, 2026

QUOTE POST

#1220rohit@KRISHNANROHIT

Another example, different models are better at different things, to get the best requires you to use the best model (and combine results from the best models), which is our primary job going forward!

Gabriel Stengel@GabeStengel

If you scroll down far enough in the blog post... can see that that Gemini 3.5 flash outperforms Opus 4.8 by a BIG margin on Finance Agent benchmark "* Finance Agent v2: Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro." No one model rules them all! As much as any one lab would like you to believe....

5:40 PM · May 28, 2026 · 6.1K Views

6:34 PM · May 28, 2026 · 1.4K Views