Residual Prediction With Noisy Vectors Mimics Diffusion Denoising In Transformers · Digg