Nando de Freitas demonstrates emergent reward maximization from AI agent interactions using an imitation learner trained without scalar reward labels · Digg