/Tech3h ago

Residual Prediction With Noisy Vectors Mimics Diffusion Denoising In Transformers

1431243.6K

Original post

did yall know that next-residual-prediction can work, even when the residual "codes" are noisy vectors with no inherent per-level structure? 20m transformer, 256 vision patches. (dotted line represents the "prefill" point, where i start with the first few patches of the recon)

kalomaze@kalomaze

@willccbb @DimitrisPapail a problem imo is people rushing to do 1D categoricals, ala VQVAE multi-categorical prediction for truly higher dim data in the same patch, where you model residuals conditioned on coarse->fine, is underexplored tried it once with random vector "codebooks", it seemed to work

4:05 PM · Jun 21, 2026 · 3.1K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS474BOOKMARKS1LIKES8

kalomaze@kalomaze

you can think of this intuitively as a different way of exposing the diffusion intuition of progressive denoising. each vector is noisy, but the cumulative summation progressively pulls you closer to the latent structure of a lower dimensional data manifold

kalomaze@kalomaze

3h47481