Exploration has always been an important part of any RL algorithms. But in the era of generative control (i.e., where the base policy is a diffusion or flow model), should we formulate it differently?
It turns out that generative policies introduce both new opportunities and challenges for exploration. Check out @calvinyluo ๐งต๐ below to learn more!
๐ญ ๐๐ ๐ฝ๐น๐ผ๐ฟ๐ฎ๐๐ถ๐ผ๐ป in the Era of ๐๐ฒ๐ป๐ฒ๐ฟ๐ฎ๐๐ถ๐๐ฒ ๐๐ผ๐ป๐๐ฟ๐ผ๐น ๐ค
Interacting with the world can be expensive!
Our #ICML2026 work shows how diffusion policies can ๐๐ญ๐ฅ๐ก๐ค๐ง๐ during online experience collection to achieve sample-efficient self-improvement! ๐