Nando de Freitas demonstrates emergent reward maximization from AI agent interactions using an imitation learner trained without scalar reward labels · Digg

Nando de Freitas demonstrates emergent reward maximization from AI agent interactions using an imitation learner trained without scalar reward labels · Digg

Posts from X

Most Activity

VIEWS7.6KBOOKMARKS76LIKES70RETWEETS11REPLIES5

Nando de Freitas@NandoDF

SFT is hungry for expert data. Causal SFT relies more on its own interventions and needs far less expert data.

https://love4all.ai/blog/emergent-reward-maximization/

38d7.6K7076

Pedro A. Ortega@AdaptiveAgents

So cool: @NandoDF shows that an agent can learn reward maximization purely from imitation.

Nando de Freitas@NandoDF

Emergent reward maximisation from interaction. Maybe one day we won't need to engineer rewards anymore.

https://love4all.ai/blog/emergent-reward-maximization/

https://github.com/nandodef/love4all-ai/tree/main/docs/files

38d4.7K1714

Thomas Tao@Thomas_Tao_1

@NandoDF I keep thinking this moves the problem to feedback quality. Bad loops still teach weird rewards.

38d431

Hassan Al-Farhan@HAF_tech

@NandoDF Promising direction. But I doubt reward engineering disappears. We just move it into the interaction protocol and data assumptions.

38d29

Invincible@InvincibleEdge

@NandoDF i hope that day comes before my portfoliocscount needs reward engineering to forgive me

38d43

Rugbist@rugbist_

@NandoDF interesting how observational needs way more data just to catch up, but interventional stabilizes earlier

isnt that the whole debate on how much label quality matters vs quantity

38d34

Nando de Freitas@NandoDF

You’re right. There’s more to the story though.

The last two experiments address robustness to teacher noise and to quantity of teacher data. The causal agent is very effective at learning with few teacher examples.

More philosophically, a causal agent in the real world would experience enough signals to learn who is a good teacher for a task. I’d like to demonstrate this next, anyone keen to help @AdaptiveAgents and me?

38d22

tsunami_crypto@ls_brd

@NandoDF so interventional data is just way more sample efficient

38d18

Invincible@InvincibleEdge

@NandoDF so the interventional approach gets reliable signal even with rare teacher actions. that feels like the practical edge.

38d16

Nando de Freitas@NandoDF

@HAF_tech Hopefully into agents continually learning via interaction and minimal engineering.

38d7