Model-free agents learn to maximise reward without modelling the environment. Right?
In recent work, we challenge this narrative by proving that agents, trained on a sufficiently rich set of goals, encode a unique and accurate world model in their value functions. 1/








