/Tech2h ago

Elie Bakouch and Meta's Lucas Beyer argue a hypothetical AI scaling challenge would reward pre-existing infrastructure over pure capability

Story Overview

The hypothetical LLM World Cup sets up top researchers from leading labs with equal 100k-GPU clusters for six months, yet the explicit allowance for each team to keep its own data, codebases, environments and kernels turns the exercise into a test of accumulated infrastructure rather than raw talent starting from zero.

2600552

#72

Original post

Lucas Beyer (bl16)@giffmana#72inTech

@eliebakouch > each team has their own current data, environments, codebase, infra/kernels

This changes the meaning completely compared to how i first understood OP btw, so i think your result will be a mix of interpretations "with their current stack" vs "from scratch"

elie@eliebakouch

exact conditions:

> gpu count is 100k B200 equivalent (like colossus 2 first cluster) but can be the team's favorite accelerators, the cluster is properly setup in terms of inter/intra node bandwidth, nodes get replaced automatically when there is an issue > best model is defined as best on a weighted average of AA + cursorbench + frontiercode + metr ECI with a budget of max($30 per task, 5M output tokens per task), the cost per task is based on anthropic's margin on fable inference (but inference stack can be optimized however they want) > there is 0 bureaucracy, just the 50 "best" people trying to build the best model possible > they all have access to the same previous generation of AI models (let's say gpt 5.5/opus 4.8, otherwise some people will argue that mythos can build itself etc. which is not the point), this includes synthetic data > otherwise each team has their own current data, environments, codebase, infra/kernels etc. (you cannot buy new environments etc.) > the first 3 months are ONLY about scaling the recipe or improving it, you cannot start training the final model or a better model to distill from (this allows chinese labs in a future poll to potentially catch up or not with the US labs). next 3 months are for pre/mid/post training, let's assume no big focus on "safety" etc. (same reason) > the benchmark choice suggests that we're focusing on text capabilities, but if there is some "knowledge transfer" it's within the rules

12:56 AM · Jun 19, 2026 · 298 Views

Open Question

The Six-Month Window Locks In Current Tools

With only half a year available, teams cannot realistically rebuild everything from scratch, so the contest measures how well existing recipes and kernels scale under identical hardware instead of who could invent the best approach in isolation.

Industry Shift

Outcomes Now Hinge on Pre-Loaded Advantages

Because the rules bar buying new environments, labs with mature data pipelines and optimized training stacks gain an edge that pure capability debates often overlook, leaving the exact mix of wins dependent on how strictly those constraints are read.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

Matt Henderson@matthen2

@eliebakouch google people would just complain about not having TPUs 😅

elie@eliebakouch

if we let the top 50 people at openai, anthropic, xai/cursor and google with the exact same amount of gpus for 6 months, who would produce the best model?

exact condition, chinese labs and "rest of the world" poll bellow, this is the LLM WORLD CUP

1h20500

REPLIES1

elie@eliebakouch

@giffmana yes, also wanted to make it """realistic""" in a way and not build everything from scratch in ~6 month (couldn't make the OP longer bcotherwise you can't do a poll otherwise🥲)

basically the goal is that labs only bring "science (data included)+ infra + talents" and compete

Lucas Beyer (bl16)@giffmana

@eliebakouch > each team has their own current data, environments, codebase, infra/kernels

This changes the meaning completely compared to how i first understood OP btw, so i think your result will be a mix of interpretations "with their current stack" vs "from scratch"

2h16700

elie@eliebakouch

@giffmana top-k talent*

2h30