How many of the big ideas of the past 15 years of AI are downstream of hardware constraints?
The big hardware story over that period is that logic has become way cheaper than data transfer.
Stacking huge numbers of matrix multiplies was perfect for this hardware regime, because matrix multiplication is logic-intensive but requires less data transfer. And so we got matmul-heavy deep learning.
It's interesting to think about what AI would look like in a world where these costs didn't diverge so much.
