Google DeepMind's Mathieu Blondel argues autoregressive models outperform diffusion for discrete sequences due to token-wise softmax independence limitations · Digg
12h ago
Google DeepMind's Mathieu Blondel argues autoregressive models outperform diffusion for discrete sequences due to token-wise softmax independence limitations
Hardware scaling trends favor FLOPS over memory bandwidth