/AI2h ago

New Trick Approximates Jacobian To Stabilize RL For Flow Policies

52642623116.6K

Original post

Diffusion (or flow) makes for excellent policies, but training them with RL is notoriously hard: BPTT is unstable, RL over diffusion blows up the horizon. In our new paper, we show how we can optimize flow matching actors by using "one weird trick" -- "approximate" the Jacobian of the flow denoising process with the identity matrix. 👇

8:52 PM · Jun 9, 2026 · 13K Views

/AI2h ago

New Trick Approximates Jacobian To Stabilize RL For Flow Policies

52642623116.6K

#32

Original post

Sergey Levine@svlevine#32inAI

8:52 PM · Jun 9, 2026 · 13K Views

Sentiment

Users reacted negatively to the Jacobian approximation technique for stabilizing RL flow policies by accusing researcher @svlevine of chasing personal romantic interests among prominent AI figures instead of pursuing technical work.

Pos

0.0%

Neg

100.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.6KLIKES20REPLIES1

Sergey Levine@svlevine

Our method (QGF) outperforms using the true Jacobian or BPTT. It is entirely a test-time method (i.e., the policy is trained with BC, the Q-function is trained with TD, and at test-time, optimize the Q-function wrt actions using the identity Jacobian "approximation").

Sergey Levine@svlevine

2h3.6K205

BOOKMARKS12

Sergey Levine@svlevine

To find out more, check out the paper and website here: https://q-guided-flow.github.io/

A fun collaboration with @zhiyuan_zhou_, @andy_peng05, @CharlesXu0124, @qiyang_li, @kvfrans, @jtspringenberg

Sergey Levine@svlevine

2h2.9K1612

Just An Artist@BraveSoul248714

@svlevine Gosh, he is literally chasing all hot guys in the ai, software and tech industry. My gosh, dude, you might have a preference for bfs but maybe the guys don't want to be in a relationship with you. Why are you collaborating with every single hot guy in the tech industry in the 🌎?