3h ago

Epoch AI's Jaime Sevilla and Luke Emberson warn that surging token demand will outpace global Blackwell capacity through 2032

The shortage could force developers to deploy smaller models.

0
Original post

Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are difficult, but it appears demand is growing much faster than supply.

1:35 PM · May 26, 2026 View on X

@Jsevillamol perhaps an underrated fact that "TAI by 2030" does not just depend on the speed of R&D, but also whether TAI fits in X billion params

probably X is like 10 at most assuming an moe with typical sparsity

Jaime SevillaJaime Sevilla@Jsevillamol

Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.

10:16 PM · May 26, 2026 · 1.1K Views
10:51 PM · May 26, 2026 · 42 Views

Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.

Epoch AIEpoch AI@EpochAIResearch

Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are difficult, but it appears demand is growing much faster than supply.

8:35 PM · May 26, 2026 · 11.3K Views
10:16 PM · May 26, 2026 · 1.1K Views

Unfortunately for token demand we have very limited information, so all we can offer are some proxies for growth. I hope we can come back to the topic in a few months with more information and a clearer conceptual framework.

Jaime SevillaJaime Sevilla@Jsevillamol

Another important conclusion on the supply side is that inference is not really compute or bandwidth bound. If you have spare resources, engineers will find ways to use them, using tools like speculative decoding and prefill chunking.

10:16 PM · May 26, 2026 · 227 Views
10:16 PM · May 26, 2026 · 194 Views

Another important conclusion on the supply side is that inference is not really compute or bandwidth bound. If you have spare resources, engineers will find ways to use them, using tools like speculative decoding and prefill chunking.

Jaime SevillaJaime Sevilla@Jsevillamol

Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.

10:16 PM · May 26, 2026 · 1.1K Views
10:16 PM · May 26, 2026 · 227 Views