Analysis argues diffusion models could scale better than autoregressive models as FLOPS cheapen relative to memory bandwidth · Digg