DriftXpress cuts training costs for drifting diffusion models
DriftXpress applies the Nyström approximation to a subsample of pre-computed landmarks drawn from real and attractor data. The technique trains drifting diffusion models with lower computational expense while preserving sample quality. Side-by-side CIFAR-10 experiments show earlier formation of recognizable objects, faster wall-clock convergence, and stronger FID reduction than prior drifting baselines that rely on mini-batch summaries.
Ali did some amazing work on the hottest new generative model: drifting models (from Mingyang Deng @Goodeat258 et al., out of Kaiming He's group).
Speeds up training a lot using low rank Nyström approximation. Check out Ali's full thread. Paper and code available!
[1/6] Diffusion models are slow at inference. Drifting Models fix that but then training becomes the bottleneck. We asked: Is it possible to slash the training cost of drifting models without sacrificing quality? Our answer: DriftXpress. 🧵
Paper and code:
> these summaries cover the entire training set oh cool so theoretically the summaries can be a non-parametric generator? and DriftXpress kinda distills from it?