/Tech5h ago

UPenn and Apple researcher Jiatao Gu introduces NF-CoT, enabling LLM latent reasoning via continuous normalizing flows instead of text tokens

The method is compatible with GRPO reinforcement learning training.

1395224410K

Original post unavailable.

/Tech5h ago

UPenn and Apple researcher Jiatao Gu introduces NF-CoT, enabling LLM latent reasoning via continuous normalizing flows instead of text tokens

The method is compatible with GRPO reinforcement learning training.

1395224410K

Original post unavailable.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Jiatao Gu@thoma_gu

🧐Why is this hard?

Simply moving CoT beyond tokens is not enough. A useful latent reasoning space hould still keep what makes token CoT powerful: • sampling diverse trajectories • scoring probabilistically • training with likelihood • decoding efficiently with KV caches

5h661

BOOKMARKS1

Jiatao Gu@thoma_gu

This work was led by two amazing @PennEngineers master’s students, @Guancheng_Tu @EthanFu0355525.

Also huge thanks to our great collaborators @SuhaoYu1020, @tyao923, @haoqik322, @Lianhuiq, and @YizheZhangNLP!

More details: http://arxiv.org/abs/2606.06447 Code&Model: Coming soon.

5h3311

LIKES1REPLIES1

Jiatao Gu@thoma_gu

NF-CoT is also cheaper than LaDiR.

Inference: 1.9× faster end-to-end, with 2.5× fewer FLOPs/sample. Training: 2.85× higher sample throughput, with 6.66× fewer FLOPs.

-> Latent thoughts are generated like tokens — autoregressively, cache-friendly, without iterative denoising.

5h371

Jiatao Gu@thoma_gu

Prior latent-CoT methods trade off. For example, • Coconut-style feedback is efficient, but mostly deterministic. • LaDiR-style diffusion latents are stochastic, but iterative and likelihood-intractable.

NF-CoT keeps the sweet spot -- stochasticity + likelihood + efficiency.

5h411

Jiatao Gu@thoma_gu

💡The idea: place a STARFlow inside LLM!

STARFlow is a SOTA normalizing flow built from Deep-Shallow Autoregressive Transformers.

In NF-CoT, shallow invertible layers map continuous latents into a deep reasoning space, where the LLM models them left-to-right alongside text.

5h81

Jiatao Gu@thoma_gu

The whole model is trained end-to-end with NLL over both latent thoughts and text answers.

At inference time, it reasons autoregressively in the deep reasoning space, while still allowing latent thoughts to be inspected or decoded for better explainability.

5h71

Jiatao Gu@thoma_gu

Moreover, the exact likelihood over both latent reasoning and text answers makes NF-CoT compatible with GRPO-style post-training with verifiable rewards — like explicit CoT, but in the continuous space and pluggable into existing RL frameworks.

5h61

Jiatao Gu@thoma_gu

Results on Qwen3-8B-Base across 5 code benchmarks:

Avg pass@1 improves from 55.8 → 68.8 (+13.0), and reaches 70.1 after RL.

NF-CoT also outperforms the strongest latent baseline, LaDiR, by +7.1%. On MBPP+, pass@1 = 72.1 — matching the base model’s pass@128.

5h61