/AI7h ago

T3 Stack creator Theo Browne argues that neither Grok nor Gemini has ever been the world's most powerful model

Story Overview

T3 Stack creator Theo Browne is pushing back on any narrative that Grok or Gemini models ever sat at the absolute top of AI capability, framing their reported leads as narrow wins on benchmarks that fail to reflect day-to-day development demands.

2071.3K265980.5K
Original post
Theo - t3.gg@theo#1784inAI

Neither Grok nor Gemini have ever had the worlds most power model. They held a slight lead in useless benchmarks.

If you ever chose either as daily drivers for serious dev work, I do not trust your judgement at all.

1:48 AM · Jun 10, 2026 · 80K Views
Open Question

Arena scores rarely settle real usage questions

Gemini 2.5 Pro did reach the top of the LMSYS leaderboard for a stretch after its March 2025 update, yet the exact length of that lead and whether it translated to broader superiority remain unclear from available records.

Developer Impact

Daily coding work sets its own bar

Theo argues that choosing models based on those leaderboard spikes signals the wrong priorities, since meaningful performance in actual software projects has followed different patterns than the brief ranking spikes.

Sentiment

Many users agreed with Theo's claim that Grok and Gemini never led with the most powerful AI models, while others called the models useless or distrusted the assessment behind the statement.

Pos
40.0%
Neg
60.0%
49 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS19.2K

@ChickenSamosaa You should read the tweet again, here I grabbed it for you.

6hViews 19.2KLikes 37
BOOKMARKS5LIKES123
pxlonchain@igetrugd

@theo this is more accurate tbh

6hViews 2.3KLikes 123Bookmarks 5
RETWEETS1
f4mi ‼️@f4micom

@theo grok is extremely lobotomized all the time and gemini has a Jason Bourne style identity crisis every time you ask it what time it is

5hViews 670Likes 26
REPLIES9

@themmyleke Both Gemini and Grok are massively overpriced and their subsidization was never as good as Codex or Claude Code.

6hViews 2.3KLikes 45
Matthew Berman@MatthewBerman

@theo Gemini 2.5 pro was a leader for a brief period of time

Neither Grok nor Gemini have ever had the worlds most power model. They held a slight lead in useless benchmarks.

If you ever chose either as daily drivers for serious dev work, I do not trust your judgement at all.

1hViews 2.1KLikes 41Bookmarks 0

@dragosroua Going to be blunt because you should hear this: this is enough information for me to know I would never hire you

6hViews 2.1KLikes 33Bookmarks 1
Them Leke@themmyleke

@theo You are speaking from a place of wealth lol.

3.1 pro was good enough as an implementation tool. I put it and codex in a letagents room, got codex to review and draw up plans and had Gemini write the code. That way I didn’t burn out limits.

But 3.5 is unusable I agree

6hViews 2.5KLikes 20Bookmarks 1

METR evals DeepSWE Any actual contributions to real projects

I seriously do not understand how anyone can use Gemini for dev work unless they are getting it for free. It’s not like “oh it’s 5% worse”, it is literally unusable, constantly looping on nonsense and never making working code.

6hViews 909Likes 17Bookmarks 1

@ChickenSamosaa “…for serious dev work”

I know reading the whole sentence is hard but you really should try it out some time

6hViews 1.7KLikes 31

@dragosroua I’m sorry but “all models are head to head now” is actually the dumbest thing I’ve seen anyone say on this app in years

6hViews 1KLikes 20
Dragos Roua@dragosroua

@theo Gemini is quite ok. Used it for a very complex localization feature and it executed well.

Claude and ChatGPT have good UI, VERY good marketing and improved harnesses, but purely from inference perspective all models are head to head now.

6hViews 1.9KLikes 8Bookmarks 1
Leon Lin@LexnLin

@theo

trust

6hViews 586Likes 19

@konopka_tg Sure, but everything that makes it “most powerful” requires tool calls, which only 2 labs are good at doing for long tasks.

6hViews 1.2KLikes 9
Dragos Roua@dragosroua

@theo You surely have some metrics for this that you can share. Something more than “I just think it is like this, period”. If you just think it is like that, it’s perfectly valid, can’t shit on people taste. But if you have metrics, I’m ready to look into them.

6hViews 728Likes 7
ChickenSamosa@ChickenSamosaa

@theo Not everyone's a troll dude. Relax.

6hViews 704Likes 7
Them Leke@themmyleke

@theo I had a Gemini Ultra plan, that thing never ran out as fast as Codex I promise you and it was a family plan which had three of my buddies on it as well.

6hViews 218Likes 7
HSVSphere@HSVSphere

@theo Hey, shitposting on X dot com is serious work

3hViews 235Likes 13
Ashutosh Tiwari@ashutosh_270497

@theo Yes definitely, only the real powerful model competition is actually between OpenAI and Claude to launch world’s most powerful SOTA model.

6hViews 100Likes 1Bookmarks 1
Dragos Roua@dragosroua

@theo Any LLM based on transformers is just a ginarmous GIGO machine: you feed it garbage it gives you back garbage. This is an outrageous simplification but useful for the conversation. If you know what questions to ask, you get better results. As for metrics…

6hViews 550Likes 6
Load more posts