Paper introduces flow language models on hyperspheres
The paper Language Modeling with Hyperspherical Flows presents flow language models that rotate token embeddings on the unit hypersphere instead of adding Gaussian noise. Training draws uniform noise on the sphere, applies SLERP interpolation, and optimizes cross-entropy loss on the posterior. Sampling runs Euler steps along tangent vectors for N-1 iterations. The approach targets the directional structure of discrete text embeddings.
Today in continuous diffusion language models, we have: - Spherical flows https://arxiv.org/abs/2605.05629 - Hyperspherical flows https://arxiv.org/abs/2605.11125
Another case of convergent evolution! Two different takes on the same core idea, published within days of each other.
🔥 New paper: Language Modeling with Hyperspherical Flows Recent flow language models (FLMs) all use Gaussian noise. Makes sense for images, but not necessarily for text 🫠 We propose to add noise by rotating embeddings on 𝕊^{d−1} instead 🌐 w/ @caglarml (1/9)
Continuous diffusion/flow models have been very successful for image generation but for language they are still in its early days, and this work pushes the area in an important direction.
Our key insight in this paper: use the geometry of embeddings, rather than borrowing Gaussian corruption from images to inject noise.
Very proud to have supervised and collaborated on this project with @jdeschena. Great execution in a very short amount of time 👏
🔥 New paper: Language Modeling with Hyperspherical Flows Recent flow language models (FLMs) all use Gaussian noise. Makes sense for images, but not necessarily for text 🫠 We propose to add noise by rotating embeddings on 𝕊^{d−1} instead 🌐 w/ @caglarml (1/9)
Also @sedielem, @ziyuwang and @NandoDF you might be interested in this work.
Continuous diffusion/flow models have been very successful for image generation but for language they are still in its early days, and this work pushes the area in an important direction. Our key insight in this paper: use the geometry of embeddings, rather than borrowing Gaussian corruption from images to inject noise. Very proud to have supervised and collaborated on this project with @jdeschena. Great execution in a very short amount of time 👏
@sedielem We really need to work on better benchmarking now
Today in continuous diffusion language models, we have: - Spherical flows https://arxiv.org/abs/2605.05629 - Hyperspherical flows https://arxiv.org/abs/2605.11125 Another case of convergent evolution! Two different takes on the same core idea, published within days of each other.