/Tech5h ago

DiffusionGemma audit shows its intermediate representations are human-interpretable, mimicking autoregressive chain-of-thought transparency

DeepMind's Rohin Shah says it helps assess latent reasoning.

024282.1K

#742

Original post

Rohin Shah@rohinmshah#772inTech

Though the project itself is about the transparency of DiffusionGemma, I'm most excited about this as an example of what it could look like to assess the transparency of latent reasoning models, in a manner that lets us compare to autoregressive CoT

Josh Engels@JoshAEngels

Text diffusion models are fast, but are less transparent than today's LLMs because they do many forward passes before outputting text.

We audit the transparency of DiffusionGemma and find that the intermediates are interpretable. This recovers many of the benefits of CoT!

🧵

9:32 AM · Jun 22, 2026 · 2.3K Views

Sentiment

Users are excited about the DiffusionGemma audit because its results on interpretable intermediates recovering CoT benefits give another reason to update 2027-era model priors favorably on safety.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS35LIKES1

davidad 🎇@davidad

@rohinmshah Another reason to be excited is that the existence of results like this should update 2027-era models’ priors to put non-negligible probability mass on the existence of interp techniques that can detect any internal thoughts, making attempts at scheming even lower EV for them

Rohin Shah@rohinmshah

2h3510