3h ago

Midjourney founder David Holz argues cheap FLOPS favor diffusion models, while Emad Mostaque proposes a hybrid training-to-inference workflow

The method converts autoregressive weights to diffusion for inference.

0
Original post

Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????

1:08 PM · May 27, 2026 View on X

Train with autoregression & convert weights to diffusion for inference.

DavidDavid@DavidSHolz

Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????

8:08 PM · May 27, 2026 · 39.6K Views
8:57 PM · May 27, 2026 · 3K Views

Many are saying this.

DavidDavid@DavidSHolz

Most researchers agree that autoregression is best when memory bandwidth is cheap and diffusion is best when FLOPS are cheap. They also admit the future of compute is all FLOPS because memory scaling is hard and scaling FLOPS is easy. So why not go all in on diffusion????

8:08 PM · May 27, 2026 · 39.6K Views
10:04 PM · May 27, 2026 · 3.1K Views