Text diffusion models are fast, but are less transparent than today's LLMs because they do many forward passes before outputting text.
We audit the transparency of DiffusionGemma and find that the intermediates are interpretable. This recovers many of the benefits of CoT!
🧵

