i’ve come around to understanding that the golden path application of OPD is MOPD to enable parallelization of exploration as well as to extend the effective number of RL steps you can juice out of any given async recipe
branch-train-merge for on-policy learning
6:24 PM · Jun 30, 2026 · 941 Views