Great work on accelerating RL with probing!
Training a model to generate RL tasks not too hard, not too easy costs many solver runs per task.
PROPEL predicts difficulty via a probe on its activations instead, amortizing cost and speeding up generator optimization.
New open-ended RL research from @Vmax + @GoodfireAI.


