/Tech13h ago

Stanford's Shuran Song details how generative control models like diffusion can actively explore for sample-efficient reinforcement learning

The approach reduces the high cost of real-world physical training.

21592010816K

#704

Original post

Shuran Song@SongShuran#704inTech

Exploration has always been an important part of any RL algorithms. But in the era of generative control (i.e., where the base policy is a diffusion or flow model), should we formulate it differently?

It turns out that generative policies introduce both new opportunities and challenges for exploration. Check out @calvinyluo 🧵👇 below to learn more!

Calvin Luo@calvinyluo

🔭 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻 in the Era of 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 🤖

Interacting with the world can be expensive!

Our #ICML2026 work shows how diffusion policies can 𝙚𝙭𝙥𝙡𝙤𝙧𝙚 during online experience collection to achieve sample-efficient self-improvement! 📈

8:42 AM · Jun 30, 2026 · 131 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS17RETWEETS20

Calvin Luo@calvinyluo

🔭 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻 in the Era of 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 🤖

Interacting with the world can be expensive!

Our #ICML2026 work shows how diffusion policies can 𝙚𝙭𝙥𝙡𝙤𝙧𝙚 during online experience collection to achieve sample-efficient self-improvement! 📈

14h15.8K157106