/AI5h ago

Jiatao Gu releases NF-CoT, using normalizing flows for continuous latent LLM reasoning instead of discrete text tokens

The system supports GRPO training and KV-cache decoding.

139323439.9K
Original post
Jiatao Gu@thoma_gu#667inAI

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

1:12 PM · Jun 8, 2026 · 5.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS1.5KLIKES14RETWEETS3
Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT integrates STARFlow to generate continuous thoughts autoregressively in LLMs, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

4hViews 1.5KLikes 14Bookmarks 3
BOOKMARKS4
Jiatao Gu@thoma_gu

This work was led by two amazing @PennEngineers master’s students, @Guancheng_Tu @EthanFu0355525.

Also huge thanks to our great collaborators @SuhaoYu1020, @tyao923, @haoqik322, @Lianhuiq, and @YizheZhangNLP!

More details: http://arxiv.org/abs/2606.06447 Code&Model: Coming soon.

Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

5hViews 329Likes 5Bookmarks 4
REPLIES1
Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

5hViews 736Likes 2Bookmarks 0
Murray Kang@haoqik322

Excited to share our follow-up work of LaDiR: Latent reasoning with Normalizing Flows (NF-CoT)!

Instead of iterative diffusion denoising, NF-CoT uses normalizing flows to generate continuous thoughts autoregressively, like tokens — with exact likelihood, KV-cache-friendly decoding, and compatibility with policy-gradient RL training such as GRPO.

Jiatao Gu@thoma_gu

🤔Can LLMs reason by sampling continuous thoughts — not just tokens?

Introducing NF-CoT: Latent Reasoning with Normalizing Flows. It samples continuous chain-of-thoughts directly in the stream of LLM with exact likelihood -- powered by STARFlow.

🌐Page: http://nf-cot.vercel.app

5hViews 814Likes 8Bookmarks 2
Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

5hViews 377Likes 2Bookmarks 0
Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

5hViews 267Likes 1Bookmarks 0
Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

5hViews 121Likes 1Bookmarks 0
Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

5hViews 116Likes 1Bookmarks 0
Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

5hViews 109Likes 1Bookmarks 0
Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

5hViews 103Likes 1Bookmarks 0