At the core of efficient diffusion is a simple question: where is information actually resolved?
The entropy profile answers this, guiding training effort toward the regions where structure is formed. Great to see this perspective used for continuous bitstream language diffusion
1/?) As promised to Sander Dieleman (@sedielem), we’re finally excited to share:
Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion
We show that continuous diffusion can achieve very strong language modeling performance when operating directly on bitstreams, outperforming masked and uniform diffusion baselines, and essentially matching autoregressive models under our evaluation settings.
