/Tech2d ago

Jiatao Gu releases NF-CoT, using normalizing flows for continuous latent LLM reasoning instead of discrete text tokens

The system supports GRPO training and KV-cache decoding.

2851710436144.8K

Original post

Lianhui Qin#772

alphaXiv@askalphaxiv

"Latent Reasoning with Normalizing Flows"

NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.

The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.

This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.

9:54 AM · Jun 8, 2026 · 18.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS19.2KBOOKMARKS63LIKES102REPLIES7

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1d19.2K10263

RETWEETS48

alphaXiv@askalphaxiv

"Latent Reasoning with Normalizing Flows"

NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.

The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.

This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.

2d18.5K356280

Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT integrates STARFlow to generate continuous thoughts autoregressively in LLMs, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1d2.7K2312

Jiatao Gu@thoma_gu

This work was led by two amazing @PennEngineers master’s students, @Guancheng_Tu @EthanFu0355525.

Also huge thanks to our great collaborators @SuhaoYu1020, @tyao923, @haoqik322, @Lianhuiq, and @YizheZhangNLP!

More details: http://arxiv.org/abs/2606.06447 Code&Model: Coming soon.

Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

1d53484

Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT uses normalizing flows to generate continuous thoughts autoregressively, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1d86482

Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

1d63040

Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1d1.1K50

Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

1d20920

Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

1d43630

Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

1d24620

Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

1d24320

Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

1d20720