Midjourney founder David Holz argues cheap FLOPS favor diffusion models, while Emad Mostaque proposes a hybrid training-to-inference workflow
The method converts autoregressive weights to diffusion for inference.
Train with autoregression & convert weights to diffusion for inference.
Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????
Many are saying this.
Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????