1d ago

Google’s Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, a nine-point gain over Gemini 3 Flash, while leading the intelligence-speed frontier and exceeding 280 output tokens per second

It posts 76.2 percent on Terminal-bench 2.1.

3736.6K280801799.6K

——0——

Original post

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

Google is held to an unreasonably high standard. Flash would ordinarily be 10% of the cost of GPT-5.5. We're not in the age where Only Google hill-climbs hard math. Even fucking Anthropic ships models that beat… DeepSeek. Everyone is serious now.

3:12 PM · May 18, 2026

Reposted by

QUOTE POST

#37François Chollet@FCHOLLET

Gemini

6:56 PM · May 19, 2026 · 38.8K Views

QUOTE POST

#83rohan anil@_AROHAN_

Knowledge cutoff on this is very confusing. Is this a bug? Does Flash not know that vibecoding is thing now? Does it not know about claude code!?

Lisan al Gaib@scaling01

Gemini 3.5 Flash now live in aistudio

5:47 PM · May 19, 2026 · 9.6K Views

6:08 AM · May 20, 2026 · 20.7K Views

QUOTE POST

#83rohan anil@_AROHAN_

I am actually quite confused.

What went wrong here: Is all 2025 + 2026 data all slop and compute inefficient? Why would you train a model in 2026 that misses an entire year+ of data and take it to market?

rohan anil@_arohan_

Knowledge cutoff on this is very confusing. Is this a bug? Does Flash not know that vibecoding is thing now? Does it not know about claude code!?

6:08 AM · May 20, 2026 · 20.7K Views

6:38 AM · May 20, 2026 · 14.2K Views

#83rohan anil@_AROHAN_

@scaling01 @PMinervini Have you tried it on antigravity its a slight improvement on tool call, but not a daily model to use for coding.

Lisan al Gaib@scaling01

meh doesn't even beat Kimi or GLM

5:54 PM · May 19, 2026 · 49.1K Views

6:44 PM · May 19, 2026 · 3.5K Views

#83rohan anil@_AROHAN_

@scaling01 @PMinervini @eliebakouch @vincentweisser it would be fun for you guys to use this against claude and codex in auto research loop and see if it has good tastes.

rohan anil@_arohan_

@scaling01 @PMinervini Have you tried it on antigravity its a slight improvement on tool call, but not a daily model to use for coding.

6:44 PM · May 19, 2026 · 3.5K Views

7:01 PM · May 19, 2026 · 1.3K Views

#83rohan anil@_AROHAN_

@scaling01 @PMinervini @eliebakouch @vincentweisser In some sense, both claude and codex both used human ingenuity and put them together in clever ways. While models lack taste on research with right prompting it can driven to really amazing outcomes. This itself can be an eval if you run it, and compare outcomes.

rohan anil@_arohan_

@scaling01 @PMinervini @eliebakouch @vincentweisser it would be fun for you guys to use this against claude and codex in auto research loop and see if it has good tastes.

7:01 PM · May 19, 2026 · 1.3K Views

7:03 PM · May 19, 2026 · 325 Views

#83rohan anil@_AROHAN_

@zephyr_z9 I am curious why you say this. Mrcr? This is good guess/deduction

Zephyr@zephyr_z9

Clearly has very low active parameters but a lot more total parameters

5:38 PM · May 19, 2026 · 39.1K Views

5:58 PM · May 19, 2026 · 5.6K Views

POST

#359Tanishq Mathew Abraham, Ph.D.@ISCIENCELUVR

Gemini 3.5 Flash announced!

5:27 PM · May 19, 2026 · 309 Views

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

RL roughly on trend, multimodality on trend, strange to see them report mediocre MRCR and ARC-AGI-2. Given the speed, it might well have fewer active parameters than Flash-3 (so they both shrink the batch and grow margin). Will be a successful model until we get some 5.6-Mini.

Lisan al Gaib@scaling01

Gemini 3.5 Flash Benchmarks

5:42 PM · May 19, 2026 · 22.7K Views

5:50 PM · May 19, 2026 · 6.4K Views

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

I'm wrong, thanks @yourboiilevi it's G3 Flash base, they just serve it faster interesting

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

5:50 PM · May 19, 2026 · 6.4K Views

5:58 PM · May 19, 2026 · 45.9K Views

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

@zephyr_z9 same model as 3 flash

Zephyr@zephyr_z9

Clearly has very low active parameters but a lot more total parameters

5:38 PM · May 19, 2026 · 39.1K Views

6:02 PM · May 19, 2026 · 1.3K Views

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

I think one neglected area in model evals is case studies of LLM-Hard questions. Like, here we see that literally nothing can crack #10 and #12 ArXivMath in a few shots. (somehow #6 yields to… Qwen-2B). If we aren't just training on test, CoTs of such problems deserve scrutiny.

Jasper Dekoninck@j_dekoninck

Meh On MathArena, Gemini 3.5 Flash is neither bad nor great. It is very fast though: I ran 1000 queries in 30 minutes.

8:28 PM · May 19, 2026 · 11K Views

9:12 PM · May 19, 2026 · 4.3K Views

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

now admittedly it got 4 tries vs 3-4 for others, but still, lmao on apex-shortlist we see that top models struggle with #18 but those below them do not. Might it just be a ground truth failure? @j_dekoninck

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

9:12 PM · May 19, 2026 · 4.3K Views

9:15 PM · May 19, 2026 · 2.1K Views

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

IMO #6 was a famous one. Recently got cracked with GPT 5.5 Pro. But that's not very interesting. A year later OpenAI's best can do this hard thing everyone was aware of, Duh. Tells us little. The recent OpenDeepThink from @wenhaocha1 et al (which I kinda reproduced) boasts +400 Elo on CF, but they also say this: "Of the seventeen unsolved problems across the Flash and 2.5 Pro runs, none crosses 5% pass@1 in any generation". I do not think models of this era literally do not have the "competence" for solving any particular programming task, it's all compositional. So I am generally much more intrigued in techniques that can break through this barrier than in amplifying pass@k by changing how we fiddle with K and partition it into l, m, n. Likewise for methods like PaCoRe from @StepFun_ai or the new MCTS from @ZyphraAI. How do we get unsolvable things solved by trading compute for intelligence rather than "performance"? Ultimately that's the whole promise of this journey to AGI via scaling, isn't it. Is there a way that doesn't just rely on iterative training of models on synthetic data? If that were all, we're at risk at having to do exponentially costly search for data recipes that do not exceed inherent capability of models and thus lead to narrow-generalizing memorization of patterns and more false promises. Yes, we can evidently stack these chairs to a dizzying height if money is no issue, but could we at least evolve to processing them into plywood already? Might be a reason GDM is so calm in the face of two "startups"; why Gemini is half-assing this main vector of market competition that is agentic SWE. Demis suspects that AGI have to be done the hard way, from the ground up. Raw bytes, universal predictors, world models; removing layers of human-digested slop between downstream outputs and bare metal as your stockpile of metal grows. The "synthetic data" stockpile might prove to be fairy gold if he's right.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

9:15 PM · May 19, 2026 · 2.1K Views

9:32 PM · May 19, 2026 · 1.4K Views

POST

#562Aaron Levie@LEVIE

Gemini 3.5 Flash is out, and it's a major jump over Gemini 3 Flash in model capability for knowledge work. We've been evaluating it on our Box AI Complex Work Eval in early release, and the model delivers a 12 percentage point jump on complex document tasks.

For testing this model, we give the Box AI Agent (using Gemini 3.5) complex problems to solve that represent common but difficult knowledge worker tasks in banking, consulting, public sector, healthcare, and other industries. These tasks can be things like drafting reports, doing due diligence, and more, given a set of relevant documents.

In our tests, Gemini 3.5 Flash delivered jumps across every industry, including:

* Financial services: 81% vs 73% (+8pp) * Public sector: 76% vs 59%, (+17pp) * Healthcare: 73% vs 51%, (+22pp) * Life Sciences: 67% vs 47%, (+20pp)

Incredible to see the continued performance gains.

Gemini 3.5 Flash will be available soon in Box AI Studio and through the Box API. The Box MCP Server will soon be available in the Gemini app with more details to come.

6:29 PM · May 19, 2026 · 21.3K Views

#716elie@ELIEBAKOUCH

@_arohan_ @scaling01 @PMinervini @vincentweisser 👀 could be fun indeed, will look into this

rohan anil@_arohan_

@scaling01 @PMinervini @eliebakouch @vincentweisser it would be fun for you guys to use this against claude and codex in auto research loop and see if it has good tastes.

7:01 PM · May 19, 2026 · 1.3K Views

8:38 PM · May 19, 2026 · 143 Views

QUOTE POST