/AI7h ago

T3 Stack creator Theo Browne argues that neither Grok nor Gemini has ever been the world's most powerful model

Story Overview

T3 Stack creator Theo Browne is pushing back on any narrative that Grok or Gemini models ever sat at the absolute top of AI capability, framing their reported leads as narrow wins on benchmarks that fail to reflect day-to-day development demands.

2071.3K265980.5K

#1784

Original post

Theo - t3.gg@theo#1784inAI

Neither Grok nor Gemini have ever had the worlds most power model. They held a slight lead in useless benchmarks.

If you ever chose either as daily drivers for serious dev work, I do not trust your judgement at all.

1:48 AM · Jun 10, 2026 · 80K Views

/AI7h ago

T3 Stack creator Theo Browne argues that neither Grok nor Gemini has ever been the world's most powerful model

Story Overview

2071.3K265980.5K

#1784

Original post

Theo - t3.gg@theo#1784inAI

Neither Grok nor Gemini have ever had the worlds most power model. They held a slight lead in useless benchmarks.

If you ever chose either as daily drivers for serious dev work, I do not trust your judgement at all.

1:48 AM · Jun 10, 2026 · 80K Views

Open Question

Arena scores rarely settle real usage questions

Gemini 2.5 Pro did reach the top of the LMSYS leaderboard for a stretch after its March 2025 update, yet the exact length of that lead and whether it translated to broader superiority remain unclear from available records.

Developer Impact

Daily coding work sets its own bar

Theo argues that choosing models based on those leaderboard spikes signals the wrong priorities, since meaningful performance in actual software projects has followed different patterns than the brief ranking spikes.

Sentiment

Many users agreed with Theo's claim that Grok and Gemini never led with the most powerful AI models, while others called the models useless or distrusted the assessment behind the statement.

Pos

40.0%

Neg

60.0%

49 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS19.2K

Theo - t3.gg@theo

@ChickenSamosaa You should read the tweet again, here I grabbed it for you.

6h19.2K37

BOOKMARKS5LIKES123

pxlonchain@igetrugd

@theo this is more accurate tbh

6h2.3K1235

RETWEETS1

f4mi ‼️@f4micom

@theo grok is extremely lobotomized all the time and gemini has a Jason Bourne style identity crisis every time you ask it what time it is

5h67026

REPLIES9

Theo - t3.gg@theo

@themmyleke Both Gemini and Grok are massively overpriced and their subsidization was never as good as Codex or Claude Code.

6h2.3K45

Theo - t3.gg@theo

@igetrugd Agreed

6h2K67

Matthew Berman@MatthewBerman

@theo Gemini 2.5 pro was a leader for a brief period of time

Theo - t3.gg@theo

Neither Grok nor Gemini have ever had the worlds most power model. They held a slight lead in useless benchmarks.

If you ever chose either as daily drivers for serious dev work, I do not trust your judgement at all.

1h2.1K410

Theo - t3.gg@theo

@dragosroua Going to be blunt because you should hear this: this is enough information for me to know I would never hire you

6h2.1K331

Them Leke@themmyleke

@theo You are speaking from a place of wealth lol.

3.1 pro was good enough as an implementation tool. I put it and codex in a letagents room, got codex to review and draw up plans and had Gemini write the code. That way I didn’t burn out limits.

But 3.5 is unusable I agree

6h2.5K201

Theo - t3.gg@theo

METR evals DeepSWE Any actual contributions to real projects

I seriously do not understand how anyone can use Gemini for dev work unless they are getting it for free. It’s not like “oh it’s 5% worse”, it is literally unusable, constantly looping on nonsense and never making working code.

6h909171

Theo - t3.gg@theo

@ChickenSamosaa “…for serious dev work”

I know reading the whole sentence is hard but you really should try it out some time

6h1.7K31

Theo - t3.gg@theo

@dragosroua I’m sorry but “all models are head to head now” is actually the dumbest thing I’ve seen anyone say on this app in years

6h1K20

Dragos Roua@dragosroua

@theo Gemini is quite ok. Used it for a very complex localization feature and it executed well.

Claude and ChatGPT have good UI, VERY good marketing and improved harnesses, but purely from inference perspective all models are head to head now.

6h1.9K81

Leon Lin@LexnLin

@theo

trust

6h58619

Theo - t3.gg@theo

@konopka_tg Sure, but everything that makes it “most powerful” requires tool calls, which only 2 labs are good at doing for long tasks.

6h1.2K9

Dragos Roua@dragosroua

@theo You surely have some metrics for this that you can share. Something more than “I just think it is like this, period”. If you just think it is like that, it’s perfectly valid, can’t shit on people taste. But if you have metrics, I’m ready to look into them.

6h7287

ChickenSamosa@ChickenSamosaa

@theo Not everyone's a troll dude. Relax.

6h7047

Them Leke@themmyleke

@theo I had a Gemini Ultra plan, that thing never ran out as fast as Codex I promise you and it was a family plan which had three of my buddies on it as well.

6h2187

HSVSphere@HSVSphere

@theo Hey, shitposting on X dot com is serious work

3h23513

Ashutosh Tiwari@ashutosh_270497

@theo Yes definitely, only the real powerful model competition is actually between OpenAI and Claude to launch world’s most powerful SOTA model.

6h10011

Dragos Roua@dragosroua

@theo Any LLM based on transformers is just a ginarmous GIGO machine: you feed it garbage it gives you back garbage. This is an outrageous simplification but useful for the conversation. If you know what questions to ask, you get better results. As for metrics…

6h5506