if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?
exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP
Creator Teortaxes predicts OpenAI would narrowly win the challenge.
if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?
exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP
Many users expressed enthusiasm for polls on which AI lab would win with equal GPUs, praising setups as no-brainers or highlighting bold teams like OpenAI and Moonshot, while some found the framing unnerving or a sign of lagging awareness.
No Digg Deeper questions have been answered for this story yet.
if we let the top 50 people at deepseek, kimi, zai, minimax with the exact same amount of gpus for 6 months, who would produce the best model?
groupe B: chinese labs
OpenAI wins under these terms I think, but it's close
if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?
exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP
if we let the top 50 people at the top ai companies in france, canada, korea and the city of rio with the exact same amount of gpus for 6 months, who would produce the best model?
group C: rest of the world
exact conditions:
> gpu count is 100k B200 equivalent (like colossus 2 first cluster) but can be the team's favorite accelerators, the cluster is properly setup in terms of inter/intra node bandwidth, nodes get replaced automatically when there is an issue > best model is defined as best on a weighted average of AA + cursorbench + frontiercode + metr ECI with a budget of max($30 per task, 5M output tokens per task), the cost per task is based on anthropic's margin on fable inference (but inference stack can be optimized however they want) > there is 0 bureaucracy, just the 50 "best" people trying to build the best model possible > they all have access to the same previous generation of AI models (let's say gpt 5.5/opus 4.8, otherwise some people will argue that mythos can build itself etc. which is not the point), this includes synthetic data > otherwise each team has their own current data, environments, codebase, infra/kernels etc. (you cannot buy new environments etc.) > the first 3 months are ONLY about scaling the recipe or improving it, you cannot start training the final model or a better model to distill from (this allows chinese labs in a future poll to potentially catch up or not with the US labs). next 3 months are for pre/mid/post training, let's assume no big focus on "safety" etc. (same reason) > the benchmark choice suggests that we're focusing on text capabilities, but if there is some "knowledge transfer" it's within the rules
if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?
exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

@eliebakouch Out of all conditions, the most powerful and unrealistic ones are “0 bureaucracy” and an oracle to pick the top 50 ppl magically, which strongly favors big corps
So Meta 1, Google 2, Amazon 3, then Ant 4 and OAI 5
(there are a lot more amazing labs btw, there's a limit of 4 per poll + this is just for fun and i'm actually curious to see what people think)
if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?
exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP
if we let the top 50 people at deepseek, kimi, zai, minimax with the exact same amount of gpus for 6 months, who would produce the best model?
groupe B: chinese labs
if we let the top 50 people at the top ai companies in france, canada, korea and the city of rio with the exact same amount of gpus for 6 months, who would produce the best model?
group C: rest of the world

@eliebakouch I’d say thinky, and I am genuinely confused why they produce so little compared to their talent density
Other than that, maybe meta

@eliebakouch Has to be moonshot no? People going for DS because of how good they are at being $ efficient but the first that really went for scale were the kimibros. They also have the balls to do things like use Muon before it was cool (they made it cool)

@eliebakouch I read ur rules, and the oracle to pick top 50 ppl + 0 bureaucracy is way too OP
So Bytedance 1, Ali 2, Huawei 3, and only afterwards u start to see the small labs

@eliebakouch Zai only knows how to copy deepseek and forward deploy into real world demand
Deepseek is nerding the shit out of math and stem.
They aren't really comparable

@sun_hanchi @eliebakouch I don't think so top 50 people in labs know each other top 50 people in Meta might need time to figure out what to do

@eliebakouch Best model assumes LLM slop?

@MoonlitMonkey69 @eliebakouch People still romanticize with DeepSeek cause of the R1, they are doing great but I picked zai

@eliebakouch hard to choose, they are all such good teams! but I just did @Zai_org because of GLM 5.2

@eliebakouch I would not pick any of those because they are used to massive compute.

@eliebakouch The word “best”, with the qualifiers provided on what you mean man this a no-brainer that I feel like people are missing.

@eliebakouch SOTA poll less gooo

@eliebakouch Roughly tied here as well