/Tech2h ago

New Trick Approximates Jacobian To Stabilize RL For Flow Policies

73043127519.5K

Original post

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

8:52 PM · Jun 9, 2026 · 11.7K Views

/Tech2h ago

New Trick Approximates Jacobian To Stabilize RL For Flow Policies

73043127519.5K

#94

Original post

Sergey Levine@svlevine#94inTech

8:52 PM · Jun 9, 2026 · 11.7K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.3KLIKES18REPLIES1

Sergey Levine@svlevine

Our method (QGF) outperforms using the true Jacobian or BPTT. It is entirely a test-time method (i.e., the policy is trained with BC, the Q-function is trained with TD, and at test-time, optimize the Q-function wrt actions using the identity Jacobian "approximation").

Sergey Levine@svlevine

2h3.3K185

BOOKMARKS10

Sergey Levine@svlevine

To find out more, check out the paper and website here: https://q-guided-flow.github.io/

A fun collaboration with @zhiyuan_zhou_, @andy_peng05, @CharlesXu0124, @qiyang_li, @kvfrans, @jtspringenberg

Sergey Levine@svlevine

2h2.6K1510