7h ago

Gordon Wetzstein introduces Spectral Progressive Diffusion, a plug-and-play framework that reduces compute demands for high-resolution image and video generation in DiT models

The original description included spectrum visualizations of progressive frequency generation.

0
Original post

High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step? We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising. [1/7]

9:00 AM · May 19, 2026 View on X
Reposted by

Training-free diffusion sampling speedups by taking advantage of the spectral autoregression property: at high noise, only low frequencies need to be represented accurately, so we can gradually increase the resolution during sampling.

Gordon WetzsteinGordon Wetzstein@GordonWetzstein

High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step? We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising. [1/7]

4:00 PM · May 19, 2026 · 17.3K Views
6:54 PM · May 19, 2026 · 4.1K Views

I’ve seen a lot of variants of this since UpscalingDiffusion and the motivation makes a ton of sense, both in that reducing image size can be seen as a perturbation of signal and that noise destroys the higher frequencies of images, which are typically lower amplitude, more quickly. If we do something adjacent to predicting coarse to fine detail during inference, having full resolution imagery may be overkill.

I’m wondering what the main obstacles are to adoption: Is it related to complication around distillation? Difficulties around dealing with arbitrary resolutions or some training tricks like sequence packing? Exceptions to the general power law spectra rule of natural images, like generating minimalist graphic designs or other things with large portions of solid color?

Gordon WetzsteinGordon Wetzstein@GordonWetzstein

High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step? We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising. [1/7]

4:00 PM · May 19, 2026 · 17.3K Views
10:14 PM · May 19, 2026 · 185 Views
Gordon Wetzstein introduces Spectral Progressive Diffusion, a plug-and-play framework that reduces compute demands for high-resolution image and video generation in DiT models · Digg