Completed my lecture on Policy Gradients 🤗
Teaching is a great way to learn. Through this process, I discovered details, proofs, and subtle assumptions hiding under the hood (see some of my notes in the thread), and picked up a few new presentation techniques along the way.
Made further progress on my lecture on policy gradients. I'm in the zone 😅
TODO: Add proofs of unbiasedness and variance analysis for both temporal-structure reward removal and baseline-based estimators.
Most lectures skim them or skip altogether. I've been always curious about them, I'm assuming students will be too.
