Okay, even more interesting : DiffusionGemma is a “loopholed” diffusion model!
Discrete diffusion usually hits the sampling wall:
the model has a rich distribution over tokens, then at each step, sampling crushes it into one hard token.
A lot of previously computed belief disappears. But DiffusionGemma keeps the previous logits alive.
So it denoises from the token AND from the belief behind the token.
That’s the idea behind « Loopholed Discrete Diffusion», a paper I was playing with this week. Exciting to see this at scale !
So DiffusionGemma is a 26B uniform-state diffusion language model! Very interesting.
This might be one of the first open-weight releases of this kind at this scale. (Not forgetting @dvruette’s public 10B version)


