Former OpenAI Sora researcher Will Depue proposes teacher-forced spectral autoregression to bypass iterative denoising in image generation
Ethan Smith notes the spectral analogy requires specific noise conditions
^left to right. please ignore the laziness with which i tweet
wait, if diffusion models are effectively implementing spectral autoregression, and image gpt models just do literally right to left autoregression, and image GPT models are taking off, why not literally just do spectral autoregression? playing with turning diffusion models into compressors right now and it’s kind of annoying ti deal with denoising process. a teacher forced spectral autoregression model could be cool i’m sure this already exists right? i guess there are a lot of ways to do order tokens for autoregression, doesn’t need to be spectral
I tried a pretty naive form of 2D FFT a while back and then several papers have happened since using wavelet decomposition with sparsity tricks.
The spectral AR analogy I think only really holds under specific conditions considering what happens when images, which generally follow a power law spectra of amplitudes, is blended with white noise, which has uniform amplitude across all frequencies.
wait, if diffusion models are effectively implementing spectral autoregression, and image gpt models just do literally right to left autoregression, and image GPT models are taking off, why not literally just do spectral autoregression? playing with turning diffusion models into compressors right now and it’s kind of annoying ti deal with denoising process. a teacher forced spectral autoregression model could be cool i’m sure this already exists right? i guess there are a lot of ways to do order tokens for autoregression, doesn’t need to be spectral