Can someone recap what are the fundamentals "walls" that prevent making GPUs less energy demanding?
Yaroslav Bulatov says physical data movement, not computation, drives GPU energy consumption because wires cannot shrink as fast as transistors
Story Overview
Yaroslav Bulatov points out that GPU energy use is dominated by the physical act of shuttling data across the chip rather than the arithmetic itself, since wires have not kept pace with transistor shrinkage and traditional RAM or PRAM models treat every operation as unit cost regardless of distance.
When local beats global by orders of magnitude
A 32-bit add might cost around 20 femtojoules while moving two words just one millimeter can burn nearly 100 times more energy, and global memory access can reach 64,000 times the cost, flipping decisions on whether to store or recompute intermediates.
Calls for a distance-aware computation model
The discussion revives a proposal for a Parallel Explicit Communication Model that assigns locations to data and processing elements so algorithm complexity reflects real on-chip or off-chip travel costs instead of assuming uniform access.
Negative users dismiss semiconductor fabrication professionals as "fat and lazy" when asked about fundamental limits to GPU energy efficiency.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?
@francoisfleuret Most energy is the cost of data movement. It's harder to shrink wires than transistors, so the transport cost came to dominate -- https://cacm.acm.org/opinion/on-the-model-of-computation-point/
Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?
Moving bits to and from memory. Because of parasitic capacitance+resistance of the wires. The bigger the memory, the longer the wires. The main trick is to organize the memory hierarchically: registers, small on-chip SRAM, caches of various types, and external RAM. It's all because we have to use hardware multiplexing: reusing the same multiply-accumulate unit for multiple parts of the network.
Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?

@francoisfleuret To turn a transistor on or off, you need to change the charge on the gate by stuffing some electrons in or taking some out.
The ones that go in come from the negative power terminal, and the ones that go out go to the positive terminal.
That consumes energy equal to...

@francoisfleuret We have made them much more efficient already in per FLOP terms by switching to fp4

@francoisfleuret A relevant read from @cHHillee (TL;DR: power consumption is proportional to transistor flips):
https://www.thonking.ai/p/strangely-matrix-multiplications

@francoisfleuret Apart from RAM being slower than compute [which makes current day GPUs inefficient], they spend lot of energy in +/*. Compute in memory methods like memristor crossbars are thus more efficient, as Kirchoff's/Ohm's laws naturally do those ops [but they are analog not digital !].

@francoisfleuret Because you need to translate everything into math description in order to run the rules of how said description evolves on silicon.
Seems like multiple levels of indirection imposes the biggest toll on the power consumption?

@francoisfleuret frequent mem transfers?

@francoisfleuret Energy mostly goes in charging and discharging capacitances, and most of those are in the gates and the tiny wires that connect them. Reducing the amount of compute reduces the cost of the gates. Reducing the amount of data movement reduces the cost of the wires.

@francoisfleuret Largely a memory bandwidth problem: https://aiafterhours.substack.com/p/the-terawatt-time-bomb-transformers

@francoisfleuret In order to emulate/predict real world event E, we created language M that can describe event E, and then we emulate inference rules for language M on the physical system that has nothing to do with language M, and happens to work because M is really abstract.

@francoisfleuret At the end it's all governed by power equations at device level multiplied by the number of active current flows per calc.
Current cuda enabled GPU support way too many calc and flows, not needed for a specific flow. Hence it takes orders of magnitude more power

@francoisfleuret 1.1V

@francoisfleuret Electrons are more energy intensive to control than photons. We can fuck around with quants but the real asymptote is the medium itself

@francoisfleuret the wall is memory bandwidth, not flops, most of the energy just moves weights around

@francoisfleuret energy demanding ? well...
An RTX PRO 6000 Blackwell is roughly 781,000 CRAY-1s of 70's compute.
At CRAY-1 efficiency, that’d be ~90 GW. About 90 nuclear reactors...
I still complain of my gpu raising my home office temperature level though...

@francoisfleuret Are you going to pay me

@francoisfleuret Landauer Limit with irreversible computing and data transfer

@francoisfleuret the most fundamental problem is that energy use is not a variable in our systems. When a model fails it's evals we throw more energy at the problem