Gemini-3.5-Flash regains the top spot on the Toolathlon leaderboard after five months with a 56.5 percent Pass@1 score on 108 agent tasks
Gemini variants also hit 67.42 percent on Terminal-Bench 2.0 physics tasks.
ouch
I miss when Flash was the underrated goat model. I genuinely loved Flash 2 and genuinely tolerated 2.5. 3 was the start of the end. 3.5 is a useless model that should not be used for, well, anything as far as I can tell
I miss the old flashes too, I didn’t make it to its retirement party, it flashed by - work of love dedication to the pursuit of algorithmic efficiency.
I miss when Flash was the underrated goat model. I genuinely loved Flash 2 and genuinely tolerated 2.5. 3 was the start of the end. 3.5 is a useless model that should not be used for, well, anything as far as I can tell
I think 3.5 is fine just not good enough to be a code model.
I miss the old flashes too, I didn’t make it to its retirement party, it flashed by - work of love dedication to the pursuit of algorithmic efficiency.
@suchenzang At least we still have principles
ouch
Oh my god it scored worse than Composer 2! Not even 2.5! And it cost 4x more to run!!!
This might be the worst major lab model drop of all time. Llama 4 tier. Insane.

Gemini Flash 3.5 is now on CursorBench, our main coding agent eval. We’ll keep updating the leaderboard as new models come out. https://cursor.com/evals
I miss when Flash was the underrated goat model. I genuinely loved Flash 2 and genuinely tolerated 2.5.
3 was the start of the end. 3.5 is a useless model that should not be used for, well, anything as far as I can tell
Oh my god it scored worse than Composer 2! Not even 2.5! And it cost 4x more to run!!! This might be the worst major lab model drop of all time. Llama 4 tier. Insane.
Video is up btw
I'm scared to make this video, but I feel like I have to. It's time to talk about Google.
flash 2 was last great google model.
I miss when Flash was the underrated goat model. I genuinely loved Flash 2 and genuinely tolerated 2.5. 3 was the start of the end. 3.5 is a useless model that should not be used for, well, anything as far as I can tell