A new paper proves that imperfect world models are fundamentally exploitable, causing reinforcement learning agents to misrank policies
The paper links this failure mode to reward hacking.
——0——
The paper links this failure mode to reward hacking.
Users thank the posters and call the paper on imperfect world models enabling policy exploitation in RL interesting because it offers useful analysis and connects to classical ideas about models.
3 comments with sentiment.