/AI2h ago

New Trick Approximates Jacobian To Stabilize RL For Flow Policies

52642623116.6K
Original post
Sergey Levine@svlevine#32inAI

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 馃憞

8:52 PM 路 Jun 9, 2026 路 13K Views
Sentiment

Users reacted negatively to the Jacobian approximation technique for stabilizing RL flow policies by accusing researcher @svlevine of chasing personal romantic interests among prominent AI figures instead of pursuing technical work.

Pos
0.0%
Neg
100.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.6KLIKES20REPLIES1
Sergey Levine@svlevine

Our method (QGF) outperforms using the true Jacobian or BPTT. It is entirely a test-time method (i.e., the policy is trained with BC, the Q-function is trained with TD, and at test-time, optimize the Q-function wrt actions using the identity Jacobian "approximation").

Sergey Levine@svlevine

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 馃憞

2hViews 3.6KLikes 20Bookmarks 5
BOOKMARKS12
Sergey Levine@svlevine

To find out more, check out the paper and website here: https://q-guided-flow.github.io/

A fun collaboration with @zhiyuan_zhou_, @andy_peng05, @CharlesXu0124, @qiyang_li, @kvfrans, @jtspringenberg

Sergey Levine@svlevine

Our method (QGF) outperforms using the true Jacobian or BPTT. It is entirely a test-time method (i.e., the policy is trained with BC, the Q-function is trained with TD, and at test-time, optimize the Q-function wrt actions using the identity Jacobian "approximation").

2hViews 2.9KLikes 16Bookmarks 12
Just An Artist@BraveSoul248714

@svlevine Gosh, he is literally chasing all hot guys in the ai, software and tech industry. My gosh, dude, you might have a preference for bfs but maybe the guys don't want to be in a relationship with you. Why are you collaborating with every single hot guy in the tech industry in the 馃寧?

2h