So next set of improvements for dingbotics: I need to update pufferlib w/ asymmetric actor critic - giving the critic privileged state so it can estimate returns better, which helps it beat up the actor (what actually gets deployed) in the right direction, better
Positive users praise adding asymmetric actor critic to Pufferlib for Dingbotics as the right approach, while negative users call it unnecessary complexity and a waste of time.
No Digg Deeper questions have been answered for this story yet.
Most Activity

This is something that I learned from reading the mujoco playground baseline. Their network sees more data than mine because they use this; so in a way, it is unfair (even though I'm beating their baseline)
So let's make it a little more fair!
after some experiments and looking at charts, this is a complete waste of time and barely helps
This is something that I learned from reading the mujoco playground baseline. Their network sees more data than mine because they use this; so in a way, it is unfair (even though I'm beating their baseline)
So let's make it a little more fair!

@yacineMTB 2017, wow. https://arxiv.org/abs/1710.06542
the critic already learns so well and so quickly that asymmetric actor / critic is very likely not wroth the complexity at all
after some experiments and looking at charts, this is a complete waste of time and barely helps

@BurstOfEntropy dont worry. i read every line of code, nothing slips by me : )

@yacineMTB this is nice! something probably needs privileged state. just be careful the llms don't give the actor some of that same sweet sweet privilege (they are so bad at programming)

@yacineMTB Interesting to think of this as a form of distillation or delegation. The general trend should be inference <<<<<<< training. This is the only way to explain the performance of biological systems. It would be awesome if it somehow became true that your ancestors are watching.

@yacineMTB doing it right