Nando de Freitas demonstrates emergent reward maximization from AI agent interactions using an imitation learner trained without scalar reward labels
Supporting charts show interventional recovery matching teacher utility curves.
——0——
SFT is hungry for expert data. Causal SFT relies more on its own interventions and needs far less expert data.
love4all.ai
Emergent reward maximization · ❤️ 4 ∀.ai
Can an interactional imitation learner, trained without scalar reward labels, recover behavior that is equivalent to expected reward maximization purely from worldwritten preference evidence? The answer as shown here is…

7:35 PM · May 22, 2026 · 2.6K Views
QUOTE POST
#1258Pedro A. Ortega@ADAPTIVEAGENTS
So cool: @NandoDF shows that an agent can learn reward maximization purely from imitation.
Emergent reward maximisation from interaction. Maybe one day we won't need to engineer rewards anymore. https://love4all.ai/blog/emergent-reward-maximization/ https://github.com/nandodef/love4all-ai/tree/main/docs/files
12:45 PM · May 22, 2026 · 7.3K Views
2:56 PM · May 22, 2026 · 3.5K Views