Google Gemini 3.5 Flash tops the vals.ai Finance Agent v2 benchmark, beating Claude Opus 4.8
The winning model averaged a cost of $2.51 per test.
——0——
QUOTE POST
#1220rohit@KRISHNANROHIT
Another example, different models are better at different things, to get the best requires you to use the best model (and combine results from the best models), which is our primary job going forward!
If you scroll down far enough in the blog post... can see that that Gemini 3.5 flash outperforms Opus 4.8 by a BIG margin on Finance Agent benchmark "* Finance Agent v2: Gemini 3.5 Flash scores 57.9% on Finance Agent v2, a significant improvement over Gemini 3.1 Pro." No one model rules them all! As much as any one lab would like you to believe....
5:40 PM · May 28, 2026 · 6.1K Views
6:34 PM · May 28, 2026 · 1.4K Views