/Tech36d ago

Leonardo.AI co-founder Ethan Smith warns that 10x training speedup claims in recent diffusion papers may rely on flawed metrics

Prior studies confirm similar speedup overestimations in Transformer pre-training

8534295.2K

Original post

@torchcompiled hi, we had a kind of bitter lesson when trying to look into "accellerating pre-training" literature: https://arxiv.org/abs/2307.06440

Ethan@torchcompiled

Numerous diffusion papers I’m seeing are citing accelerating training on the order of 10x or so (if not more). Not to mention many of these are orthogonal directions like compression, additional losses/supervision like REPA, token dropping like TREAD, and many more. I’m kinda tempted to say the metric by which acceleration is measured might be off? On the other hand I feel like diffusion/flow could be a more complex fish to fry and might be running in a quite suboptimal way, considering how many design choices there are both in representation space, architecture, parameterization of the diffusion itself. Then, it may legitamately have sizable wins, as fairly low hanging fruit, that aren’t as available to AR models

5:53 AM · May 25, 2026 · 243 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

ARXIV.ORGVia

#714

Posts from X

Most Activity

VIEWS37RETWEETS1

Ethan@torchcompiled

This is not saying that diffusion will surpass AR in acceleration necessarily, but more that LLM improvements have much smaller deltas in gain than diffusion papers, and this might be reflective that the current state of diffusion is clunky

36d37

Alex UGift@Radipdegen

@torchcompiled bro forgot to finish the sentence and went off to accelerate a training run mid-type

36d6

Lumin@luminxbt

@torchcompiled 10x is insane if that holds across architectures

wonder how much of it is just better data curation vs actual method

36d2