2h ago

Diffusion Papers Claim 10x Training Speedups With Orthogonal Techniques

0
Original post

Numerous diffusion papers I’m seeing are citing accelerating training on the order of 10x or so (if not more). Not to mention many of these are orthogonal directions like compression, additional losses/supervision like REPA, token dropping like TREAD, and many more. I’m kinda tempted to say the metric by which acceleration is measured might be off? On the other hand I feel like diffusion/flow could be a more complex fish to fry and legitamately has sizable wins, as fairly low hanging fruit, that aren’t as available to AR models

12:29 AM · May 25, 2026 View on X

Numerous diffusion papers I’m seeing are citing accelerating training on the order of 10x or so (if not more). Not to mention many of these are orthogonal directions like compression, additional losses/supervision like REPA, token dropping like TREAD, and many more. I’m kinda tempted to say the metric by which acceleration is measured might be off? On the other hand I feel like diffusion/flow could be a more complex fish to fry and might be running in a quite suboptimal way, considering how many design choices there are both in representation space, architecture, parameterization of the diffusion itself. Then, it may legitamately have sizable wins, as fairly low hanging fruit, that aren’t as available to AR models

7:32 AM · May 25, 2026 · 818 Views

This is not saying that diffusion will surpass AR in acceleration necessarily, but more that LLM improvements have much smaller deltas in gain than diffusion papers, and this might be reflective that the current state of diffusion is clunky

EthanEthan@torchcompiled

Numerous diffusion papers I’m seeing are citing accelerating training on the order of 10x or so (if not more). Not to mention many of these are orthogonal directions like compression, additional losses/supervision like REPA, token dropping like TREAD, and many more. I’m kinda tempted to say the metric by which acceleration is measured might be off? On the other hand I feel like diffusion/flow could be a more complex fish to fry and might be running in a quite suboptimal way, considering how many design choices there are both in representation space, architecture, parameterization of the diffusion itself. Then, it may legitamately have sizable wins, as fairly low hanging fruit, that aren’t as available to AR models

7:32 AM · May 25, 2026 · 818 Views
7:33 AM · May 25, 2026 · 229 Views