1h agoGoogle's Gemini 3.5 Flash tops the vals.ai Finance Agent benchmark, beating Claude Opus 4.8 with 57.86% accuracyThe winning run cost $2.51 with 322 seconds of latency