/Tech2d ago

Jiatao Gu releases NF-CoT, using normalizing flows for continuous latent LLM reasoning instead of discrete text tokens

The system supports GRPO training and KV-cache decoding.

2851710436144.8K
Original postLianhui Qin#772
alphaXiv@askalphaxiv

"Latent Reasoning with Normalizing Flows"

NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.

The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.

This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.

9:54 AM · Jun 8, 2026 · 18.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS19.2KBOOKMARKS63LIKES102REPLIES7
Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1dViews 19.2KLikes 102Bookmarks 63
RETWEETS48
alphaXiv@askalphaxiv

"Latent Reasoning with Normalizing Flows"

NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.

The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.

This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.

2dViews 18.5KLikes 356Bookmarks 280
Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT integrates STARFlow to generate continuous thoughts autoregressively in LLMs, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1dViews 2.7KLikes 23Bookmarks 12
Jiatao Gu@thoma_gu

This work was led by two amazing @PennEngineers master’s students, @Guancheng_Tu @EthanFu0355525.

Also huge thanks to our great collaborators @SuhaoYu1020, @tyao923, @haoqik322, @Lianhuiq, and @YizheZhangNLP!

More details: http://arxiv.org/abs/2606.06447 Code&Model: Coming soon.

Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

1dViews 534Likes 8Bookmarks 4
Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT uses normalizing flows to generate continuous thoughts autoregressively, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1dViews 864Likes 8Bookmarks 2
Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

1dViews 630Likes 4Bookmarks 0
Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1dViews 1.1KLikes 5Bookmarks 0
Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

1dViews 209Likes 2Bookmarks 0
Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

1dViews 436Likes 3Bookmarks 0
Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

1dViews 246Likes 2Bookmarks 0
Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

1dViews 243Likes 2Bookmarks 0
Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

1dViews 207Likes 2Bookmarks 0