/Tech9h ago

Yaroslav Bulatov says physical data movement, not computation, drives GPU energy consumption because wires cannot shrink as fast as transistors

Story Overview

Yaroslav Bulatov points out that GPU energy use is dominated by the physical act of shuttling data across the chip rather than the arithmetic itself, since wires have not kept pace with transistor shrinkage and traditional RAM or PRAM models treat every operation as unit cost regardless of distance.

317412918.7K

Original post

François Fleuret@francoisfleuret#577inTech

Can someone recap what are the fundamentals "walls" that prevent making GPUs less energy demanding?

2:59 AM · Jun 27, 2026 · 66 Views

Cost Pressure

When local beats global by orders of magnitude

A 32-bit add might cost around 20 femtojoules while moving two words just one millimeter can burn nearly 100 times more energy, and global memory access can reach 64,000 times the cost, flipping decisions on whether to store or recompute intermediates.

Open Question

Calls for a distance-aware computation model

The discussion revives a proposal for a Parallel Explicit Communication Model that assigns locations to data and processing elements so algorithm complexity reflects real on-chip or off-chip travel costs instead of assuming uniform access.

Sentiment

Negative users dismiss semiconductor fabrication professionals as "fat and lazy" when asked about fundamental limits to GPU energy efficiency.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

COMMUNICATIONS OF THE ACMVia

#745

Posts from X

Most Activity

VIEWS18.5KBOOKMARKS25LIKES67RETWEETS1REPLIES31

François Fleuret@francoisfleuret

Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?

9h18.5K6725

Yaroslav Bulatov@yaroslavvb

@francoisfleuret Most energy is the cost of data movement. It's harder to shrink wires than transistors, so the transport cost came to dominate -- https://cacm.acm.org/opinion/on-the-model-of-computation-point/

François Fleuret@francoisfleuret

Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?

4h28053

Yann LeCun@ylecun

Moving bits to and from memory. Because of parasitic capacitance+resistance of the wires. The bigger the memory, the longer the wires. The main trick is to organize the memory hierarchically: registers, small on-chip SRAM, caches of various types, and external RAM. It's all because we have to use hardware multiplexing: reusing the same multiply-accumulate unit for multiple parts of the network.

François Fleuret@francoisfleuret

Can someone recap what are the fundamental "walls" that prevent making GPUs less energy demanding?

27m70551

Matt Timmermans@matt_timmermans

@francoisfleuret To turn a transistor on or off, you need to change the charge on the gate by stuffing some electrons in or taking some out.

The ones that go in come from the negative power terminal, and the ones that go out go to the positive terminal.

That consumes energy equal to...

5h2312

ShadowLibrarian@shad0wlibrarian

@francoisfleuret We have made them much more efficient already in per FLOP terms by switching to fp4

7h3715

Ilan Fridman Rojas@irfnali1

@francoisfleuret A relevant read from @cHHillee (TL;DR: power consumption is proportional to transistor flips):

https://www.thonking.ai/p/strangely-matrix-multiplications

3h2911

Muktabh Mayank@muktabh

@francoisfleuret Apart from RAM being slower than compute [which makes current day GPUs inefficient], they spend lot of energy in +/*. Compute in memory methods like memristor crossbars are thus more efficient, as Kirchoff's/Ohm's laws naturally do those ops [but they are analog not digital !].

5h1041

fu.@RobanMavr

@francoisfleuret Because you need to translate everything into math description in order to run the rules of how said description evolves on silicon.

Seems like multiple levels of indirection imposes the biggest toll on the power consumption?

6h95

davinci@leothecurious

@francoisfleuret frequent mem transfers?

8h4371

Donal Fellows@donalfellows

@francoisfleuret Energy mostly goes in charging and discharging capacitances, and most of those are in the gates and the tiny wires that connect them. Reducing the amount of compute reduces the cost of the gates. Reducing the amount of data movement reduces the cost of the wires.

2h10

shwetank@Shwetankumar

@francoisfleuret Largely a memory bandwidth problem: https://aiafterhours.substack.com/p/the-terawatt-time-bomb-transformers

5h1042

fu.@RobanMavr

@francoisfleuret In order to emulate/predict real world event E, we created language M that can describe event E, and then we emulate inference rules for language M on the physical system that has nothing to do with language M, and happens to work because M is really abstract.

6h3

Yudhishthir Kandel@ykandel

@francoisfleuret At the end it's all governed by power equations at device level multiplied by the number of active current flows per calc.

Current cuda enabled GPU support way too many calc and flows, not needed for a specific flow. Hence it takes orders of magnitude more power

6h299

Jay@treeinnauvis

@francoisfleuret 1.1V

7h212

👻@campedersen

@francoisfleuret Electrons are more energy intensive to control than photons. We can fuck around with quants but the real asymptote is the medium itself

7h159

Suresh@_Suresh2

@francoisfleuret the wall is memory bandwidth, not flops, most of the energy just moves weights around

6h151

Xavier Rey-Robert@XReyRobert

@francoisfleuret energy demanding ? well...

An RTX PRO 6000 Blackwell is roughly 781,000 CRAY-1s of 70's compute.

At CRAY-1 efficiency, that’d be ~90 GW. About 90 nuclear reactors...

I still complain of my gpu raising my home office temperature level though...

6h461

Arthur Wakefield@JerichoLeylines

@francoisfleuret Are you going to pay me

5h133

Лъчезар Томов@Lptomov82

@francoisfleuret Landauer Limit with irreversible computing and data transfer

7h99

Fergus Meiklejohn@airuyi

@francoisfleuret the most fundamental problem is that energy use is not a variable in our systems. When a model fails it's evals we throw more energy at the problem

4h35