"Latent Reasoning with Normalizing Flows"
NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.
The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.
This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.







