/AI2h ago

Microsoft VP of AI Nando de Freitas proposes replacing complex training pipelines with unified, continual interactive causal agents

The single stream replaces separate SFT and RLHF objectives

10464343.2K

#28

Original post

Nando de Freitas@NandoDF#28inAI

The field of AI is at a local minimum. Not a local minimum in architectures and models, but a local minimum on how we train: a Frankenstein multi-stage approach. In this new blog entry, I propose a different route based on continual interaction and causality.

https://love4all.ai/blog/continual-interactive-causal-agents/

4:50 AM · Jun 7, 2026 · 2.8K Views

/AI2h ago

Microsoft VP of AI Nando de Freitas proposes replacing complex training pipelines with unified, continual interactive causal agents

The single stream replaces separate SFT and RLHF objectives

10464343.2K

#28

Original post

Nando de Freitas@NandoDF#28inAI

https://love4all.ai/blog/continual-interactive-causal-agents/

4:50 AM · Jun 7, 2026 · 2.8K Views

Sentiment

Users are praising Nando de Freitas's single-stream interactive training proposal for AI agents because they find the causality take interesting and the overall direction worth trying.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS746BOOKMARKS1LIKES3REPLIES2

Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

Nando de Freitas@NandoDF

https://love4all.ai/blog/continual-interactive-causal-agents/

2h74631

Pim de Witte@PimDeWitte

@Michael_J_Black @NandoDF So basically world models 😜

1h411

Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black I view all of those things as just text conditioning and steerability on a WM architecture. What you’re describing here is precisely the original promise (and reason) WMs are being pursued so hard. In case you hadn’t read yet: https://www.notboring.co/p/world-models

1h8

Nando de Freitas@NandoDF

@PimDeWitte @Michael_J_Black Precisely not. This is not about model architectures, what people often stress when talking about world models. This works with Jepa or GPT. This is about causal interactive training. It’s all about environments, not agents.

1h16

Nando de Freitas@NandoDF

@Michael_J_Black Thanks, Michael. It’s still not very developed or properly tested, but I agree that directionally, it feels worth trying

Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

1h19610

Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black P(S’|A,S) defines how the world evolves, P(S’|do(A),S) is how you train general agents inside those WMs. They supplement each other in the loop you lay out. I see your distinction though - two sides of the same coin

55m221

tsunami_crypto@ls_brd

@NandoDF multi-stage pipeline feels like duct taping training phases together

the interventional agent is cleaner but do we have the infra to pull it off

2h34

Blissy@BlissyOnX

@NandoDF the single stream feels inevitable tbh, but the gap between proposing it and making it work is where the real friction lives

2h25

Strata@ChainZenit

@NandoDF this take on causality is super interesting, how did you start?

2h24

Roei Herzig ✈️ CVPR@roeiherzig

@NandoDF What is the conceptual difference between stages 2 and 3, or 5 and 6?

15m21

Rugbist@rugbist_

@NandoDF single stream agent approach would def reduce all the prompt engineering hell we deal with now

question is what happens to fine tuning in that setup

2h19

Alex YGift@Radipdegen

@NandoDF hard to disagree that multi-stage feels like patching a leak with more patches

wonder if the compute budget holds up in practice though

2h16

Invincible@InvincibleEdge

@NandoDF curious how u see continual interaction scaling in practice tho

most labs cant even keep one training run stable

2h12

Guilherme O'Tina@guilhermeotina

the multi-stage pipeline is ugly but it solves a real data geometry problem. pretraining needs density (next token prediction on web text), alignment needs human preferences, rl needs environment rollouts. collapsing them into one stream means the environment has to provide all three signal types at the right ratios. that feels less like a learning algorithm problem and more like an environment design problem

1h11

David Brillembourg Capriles@dbrillembourg

@PimDeWitte @Michael_J_Black @NandoDF 🚀

1h8

Nando de Freitas@NandoDF

@ls_brd Yes, it can be parallelised.

1h1

Nando de Freitas@NandoDF

@PimDeWitte @Michael_J_Black P(S’|A,S) in the blog you shared is wrong. It should be P(S’|do(A),S). Our blog explains why.

1h1

Nando de Freitas@NandoDF

@PimDeWitte @Michael_J_Black This is a solution for how world models should be trained so they become proper causal agents. This solution was developed by @AdaptiveAgents and myself over the years. We are familiar with the literature. Thanks for the link.

1h1