/Tech3h ago

Prime Intellect's Elie Bakouch proposes an "LLM World Cup" to compare AI labs using identical 100,000-B200 GPU clusters

Creator Teortaxes predicts OpenAI would narrowly win the challenge.

3211141422.2K

#501

Original post

elie@eliebakouch#1136inTech

if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?

exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

10:28 PM · Jun 18, 2026 · 8.2K Views

Sentiment

Many users expressed enthusiasm for polls on which AI lab would win with equal GPUs, praising setups as no-brainers or highlighting bold teams like OpenAI and Moonshot, while some found the framing unnerving or a sign of lagging awareness.

Pos

77.8%

Neg

22.2%

7 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS4.9KLIKES31RETWEETS1REPLIES8

elie@eliebakouch

if we let the top 50 people at deepseek, kimi, zai, minimax with the exact same amount of gpus for 6 months, who would produce the best model?

groupe B: chinese labs

3h4.9K314

BOOKMARKS7

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

OpenAI wins under these terms I think, but it's close

elie@eliebakouch

if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?

exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

2h3.2K227

elie@eliebakouch

if we let the top 50 people at the top ai companies in france, canada, korea and the city of rio with the exact same amount of gpus for 6 months, who would produce the best model?

group C: rest of the world

2h2.5K141

elie@eliebakouch

exact conditions:

> gpu count is 100k B200 equivalent (like colossus 2 first cluster) but can be the team's favorite accelerators, the cluster is properly setup in terms of inter/intra node bandwidth, nodes get replaced automatically when there is an issue > best model is defined as best on a weighted average of AA + cursorbench + frontiercode + metr ECI with a budget of max($30 per task, 5M output tokens per task), the cost per task is based on anthropic's margin on fable inference (but inference stack can be optimized however they want) > there is 0 bureaucracy, just the 50 "best" people trying to build the best model possible > they all have access to the same previous generation of AI models (let's say gpt 5.5/opus 4.8, otherwise some people will argue that mythos can build itself etc. which is not the point), this includes synthetic data > otherwise each team has their own current data, environments, codebase, infra/kernels etc. (you cannot buy new environments etc.) > the first 3 months are ONLY about scaling the recipe or improving it, you cannot start training the final model or a better model to distill from (this allows chinese labs in a future poll to potentially catch up or not with the US labs). next 3 months are for pre/mid/post training, let's assume no big focus on "safety" etc. (same reason) > the benchmark choice suggests that we're focusing on text capabilities, but if there is some "knowledge transfer" it's within the rules

elie@eliebakouch

if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?

exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

3h1.4K101

Hanchi Sun@sun_hanchi

@eliebakouch Out of all conditions, the most powerful and unrealistic ones are “0 bureaucracy” and an oracle to pick the top 50 ppl magically, which strongly favors big corps

So Meta 1, Google 2, Amazon 3, then Ant 4 and OAI 5

2h871

elie@eliebakouch

(there are a lot more amazing labs btw, there's a limit of 4 per poll + this is just for fun and i'm actually curious to see what people think)

elie@eliebakouch

if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?

exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

2h78030

elie@eliebakouch

if we let the top 50 people at deepseek, kimi, zai, minimax with the exact same amount of gpus for 6 months, who would produce the best model?

groupe B: chinese labs

3h1.2K20

elie@eliebakouch

if we let the top 50 people at the top ai companies in france, canada, korea and the city of rio with the exact same amount of gpus for 6 months, who would produce the best model?

group C: rest of the world

2h92210

Hanchi Sun@sun_hanchi

@eliebakouch I’d say thinky, and I am genuinely confused why they produce so little compared to their talent density

Other than that, maybe meta

2h991

martin@marterrz

@eliebakouch Has to be moonshot no? People going for DS because of how good they are at being $ efficient but the first that really went for scale were the kimibros. They also have the balls to do things like use Muon before it was cool (they made it cool)

2h731

Hanchi Sun@sun_hanchi

@eliebakouch I read ur rules, and the oracle to pick top 50 ppl + 0 bureaucracy is way too OP

So Bytedance 1, Ali 2, Huawei 3, and only afterwards u start to see the small labs

2h178

NobodyExistsOnTheInternet@nullvaluetensor

@eliebakouch Zai only knows how to copy deepseek and forward deploy into real world demand

Deepseek is nerding the shit out of math and stem.

They aren't really comparable

2h81

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@sun_hanchi @eliebakouch I don't think so top 50 people in labs know each other top 50 people in Meta might need time to figure out what to do

2h211

Andrew Ngmi@hide0usk0jima

@eliebakouch Best model assumes LLM slop?

2h66

SK@Samking207

@MoonlitMonkey69 @eliebakouch People still romanticize with DeepSeek cause of the R1, they are doing great but I picked zai

2h161

Pritish Mishra@pritmish

@eliebakouch hard to choose, they are all such good teams! but I just did @Zai_org because of GLM 5.2

2h29

Moonlit Monkey@MoonlitMonkey69

@eliebakouch I would not pick any of those because they are used to massive compute.

3h29

Chris 🇨🇦@llm_wizard

@eliebakouch The word “best”, with the qualifiers provided on what you mean man this a no-brainer that I feel like people are missing.

1h20

yash@yashetal

@eliebakouch SOTA poll less gooo

1h13

ar0cket1@ar0cket1

@eliebakouch Roughly tied here as well

2h13