/Tech1h ago

Professor Completes Policy Gradients Lecture With Proofs And Variance Analysis

2156132.1K

Original post

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD#904inTech

Completed my lecture on Policy Gradients 🤗

Teaching is a great way to learn. Through this process, I discovered details, proofs, and subtle assumptions hiding under the hood (see some of my notes in the thread), and picked up a few new presentation techniques along the way.

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Made further progress on my lecture on policy gradients. I'm in the zone 😅

TODO: Add proofs of unbiasedness and variance analysis for both temporal-structure reward removal and baseline-based estimators.

Most lectures skim them or skip altogether. I've been always curious about them, I'm assuming students will be too.

3:03 AM · Jun 16, 2026 · 1.3K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS491BOOKMARKS5LIKES3RETWEETS2REPLIES1

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Long version

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Completed my lecture on Policy Gradients 🤗

1h49135

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Additional notes:

Policy Gradients: Reward-to-Go Unbiasedness and Variance (PDF): https://drive.google.com/file/d/1t8yqAbVeI9bC3VZboHICxIJD4e8Q3tEB/view?usp=share_link

Policy Gradients: Baseline Unbiasedness and Variance Reduction (PDF): https://drive.google.com/file/d/15IJ2_7cUfjdkysOa_Qm52DCdtrXBiKvy/view?usp=share_link

Kosta Derpanis (sabbatical in Zurich)@CSProfKGD

Long version

1h23221