/AI23h ago

Bitstream Diffusion Model Closes Gap With Autoregressive Language Models

4358163K

📢 June 8 (Mon): Entropy-Gated Continuous Bitstream Diffusion for Language

🤔Diffusion language models (DLMs) promise parallel, order-agnostic generation, but on standard benchmarks they have historically lagged behind autoregressive models in sample quality and diversity.

💡Recent continuous flow and diffusion approaches over token embeddings have narrowed this gap, suggesting continuous state spaces are highly effective for language. In this work, the authors further close the autoregressive gap by modeling text as a continuous diffusion process over fixed-width binary bitstreams.

🔧Their approach represents semantic tokens as analog bit sequences and utilizes a matched-filter residual parameterization to isolate contextual learning from analytic independent-bit posteriors. Crucially, they adopt a stochastic sampler that applies Langevin-type corrections gated by the entropy-rate profile, automatically concentrating stochasticity in high-information regions while remaining nearly deterministic elsewhere.

📈On the One Billion Word Benchmark (LM1B), their 130M-parameter bitstream model reaches a generative perplexity (Gen. PPL) of 59.76 at matched real-data entropy (4.31) using 256 neural function evaluations (NFEs), decisively outperforming prior DLM baselines and reaching the autoregressive reference. On OpenWebText (OWT), the authors' stochastic sampler establishes a new continuous-DLM Pareto frontier, achieving Gen. PPL = 27.06 at an entropy of 5.26 using 4× fewer steps than previous 1024-NFE baselines.

🌍As an additional architectural benefit, bitstream diffusion removes the O(V) vocabulary scaling bottleneck shared by standard DLMs. By predicting O(log V) bitwise logits via semantic bit-patching, the model yields a reduced memory footprint and higher throughput, demonstrating a scalable paradigm for language generation as vocabulary sizes grow.

This Monday, Georgios Batzolis (@GBatz97, https://gbatzolis.github.io/) will present his recent work “Towards Closing the Autoregressive Gap in Language Modeling via Entropy-Gated Continuous Bitstream Diffusion”.

11:04 AM · Jun 5, 2026 · 3K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS39REPLIES1

Collaborators: Mark Girolami (@TuringChiefSci), Luca Ambrogioni (@LucaAmb)

Paper link: https://arxiv.org/abs/2605.07013

23hViews 39Likes 3
LIKES3

@TuringChiefSci @LucaAmb Meeting link: https://teams.live.com/meet/935657993819?p=XomgLoX5SceKIViDSF

23hViews 23Likes 3