Vanilla IMLE treats all trajectories in the training dataset equally despite their varying quality. Since our goal is to generate trajectories of high quality, we propose reward-weighted IMLE, which emphasizes trajectories with high reward more than those with low reward.
(4/7)
Our method uses Implicit Maximum Likelihood Estimation (IMLE) to train a one-step generative model on trajectories of varying quality. IMLE works by having each trajectory in the training dataset pull the closest generated trajectory towards it.
(3/7)