Alistair Letcher mathematically proves that model-free RL agents trained on diverse goals encode world models inside their Q-values · Digg