Emergent reward maximisation from interaction. Maybe one day we won't need to engineer rewards anymore.
https://love4all.ai/blog/emergent-reward-maximization/
https://github.com/nandodef/love4all-ai/tree/main/docs/files
Supporting charts show interventional recovery matching teacher utility curves.
Emergent reward maximisation from interaction. Maybe one day we won't need to engineer rewards anymore.
https://love4all.ai/blog/emergent-reward-maximization/
https://github.com/nandodef/love4all-ai/tree/main/docs/files
Positive users express optimism about research showing emergent reward maximization and causal SFT because these approaches enable robust continual agent learning with less expert data and engineering.
No Digg Deeper questions have been answered for this story yet.
SFT is hungry for expert data. Causal SFT relies more on its own interventions and needs far less expert data.
https://love4all.ai/blog/emergent-reward-maximization/
So cool: @NandoDF shows that an agent can learn reward maximization purely from imitation.
Emergent reward maximisation from interaction. Maybe one day we won't need to engineer rewards anymore.
https://love4all.ai/blog/emergent-reward-maximization/
https://github.com/nandodef/love4all-ai/tree/main/docs/files

@NandoDF I keep thinking this moves the problem to feedback quality. Bad loops still teach weird rewards.

@NandoDF Promising direction. But I doubt reward engineering disappears. We just move it into the interaction protocol and data assumptions.

@NandoDF i hope that day comes before my portfoliocscount needs reward engineering to forgive me

@NandoDF interesting how observational needs way more data just to catch up, but interventional stabilizes earlier
isnt that the whole debate on how much label quality matters vs quantity

You’re right. There’s more to the story though.
The last two experiments address robustness to teacher noise and to quantity of teacher data. The causal agent is very effective at learning with few teacher examples.
More philosophically, a causal agent in the real world would experience enough signals to learn who is a good teacher for a task. I’d like to demonstrate this next, anyone keen to help @AdaptiveAgents and me?

@NandoDF so interventional data is just way more sample efficient

@NandoDF so the interventional approach gets reliable signal even with rare teacher actions. that feels like the practical edge.

@HAF_tech Hopefully into agents continually learning via interaction and minimal engineering.