New blog post on our recent paper: Beyond Single Tokens, Distilling Discrete Diffusion.
D-MMD lands on the Pareto frontier of gen PPL vs. diversity, outperforming continuous diffusion distillation approaches — while staying native to discrete tokens.

Why this is non-trivial: continuous diffusion distillation has a deterministic flow map to lean on. Discrete diffusion has categories. There is no simple object such as a flow map.
New blog post on our recent paper: Beyond Single Tokens, Distilling Discrete Diffusion. D-MMD lands on the Pareto frontier of gen PPL vs. diversity, outperforming continuous diffusion distillation approaches — while staying native to discrete tokens. https://ehoogeboom.github.io/post/discrete_mmd_diffusion_language_models/
The takeaway: you don't need to map discrete generation back into continuous space. You can actually distill the discrete process. Post: https://ehoogeboom.github.io/post/discrete_mmd_diffusion_language_models/ Paper: https://arxiv.org/abs/2603.20155 Unofficial small-scale implementation: https://github.com/ehoogeboom/discrete-diffusion-lm
Why this is non-trivial: continuous diffusion distillation has a deterministic flow map to lean on. Discrete diffusion has categories. There is no simple object such as a flow map.