Epoch AI's Jaime Sevilla and Luke Emberson warn that surging token demand will outpace global Blackwell capacity through 2032
The shortage could force developers to deploy smaller models.
@Jsevillamol perhaps an underrated fact that "TAI by 2030" does not just depend on the speed of R&D, but also whether TAI fits in X billion params
probably X is like 10 at most assuming an moe with typical sparsity
Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.
Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.
Are we nearing a compute crunch? In our latest Gradient Update, @luke__emberson and @Jsevillamol estimate how many tokens all the Blackwell chips on Earth could serve, and compare this to total token demand. Direct comparisons are difficult, but it appears demand is growing much faster than supply.
Unfortunately for token demand we have very limited information, so all we can offer are some proxies for growth. I hope we can come back to the topic in a few months with more information and a clearer conceptual framework.
Another important conclusion on the supply side is that inference is not really compute or bandwidth bound. If you have spare resources, engineers will find ways to use them, using tools like speculative decoding and prefill chunking.
Another important conclusion on the supply side is that inference is not really compute or bandwidth bound. If you have spare resources, engineers will find ways to use them, using tools like speculative decoding and prefill chunking.
Deep dive into token supply and demand! I come away with the impression that there is going to be significant pressure to keep models intended for the general public small in size.