4h ago

D-MMD Distills Discrete Diffusion Models to Pareto Frontier in Perplexity and Diversity

0
Original post

New blog post on our recent paper: Beyond Single Tokens, Distilling Discrete Diffusion. D-MMD lands on the Pareto frontier of gen PPL vs. diversity, outperforming continuous diffusion distillation approaches — while staying native to discrete tokens. https://ehoogeboom.github.io/post/discrete_mmd_diffusion_language_models/

6:28 AM · May 17, 2026 View on X

Why this is non-trivial: continuous diffusion distillation has a deterministic flow map to lean on. Discrete diffusion has categories. There is no simple object such as a flow map.

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

New blog post on our recent paper: Beyond Single Tokens, Distilling Discrete Diffusion. D-MMD lands on the Pareto frontier of gen PPL vs. diversity, outperforming continuous diffusion distillation approaches — while staying native to discrete tokens. https://ehoogeboom.github.io/post/discrete_mmd_diffusion_language_models/

1:28 PM · May 17, 2026 · 3.7K Views
1:28 PM · May 17, 2026 · 282 Views

The takeaway: you don't need to map discrete generation back into continuous space. You can actually distill the discrete process. Post: https://ehoogeboom.github.io/post/discrete_mmd_diffusion_language_models/ Paper: https://arxiv.org/abs/2603.20155 Unofficial small-scale implementation: https://github.com/ehoogeboom/discrete-diffusion-lm

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

Why this is non-trivial: continuous diffusion distillation has a deterministic flow map to lean on. Discrete diffusion has categories. There is no simple object such as a flow map.

1:28 PM · May 17, 2026 · 282 Views
1:28 PM · May 17, 2026 · 238 Views
D-MMD Distills Discrete Diffusion Models to Pareto Frontier in Perplexity and Diversity · Digg