/AI2h ago

Microsoft VP of AI Nando de Freitas proposes replacing complex training pipelines with unified, continual interactive causal agents

The single stream replaces separate SFT and RLHF objectives

10464343.2K
Original post
Nando de Freitas@NandoDF#28inAI

The field of AI is at a local minimum. Not a local minimum in architectures and models, but a local minimum on how we train: a Frankenstein multi-stage approach. In this new blog entry, I propose a different route based on continual interaction and causality.

https://love4all.ai/blog/continual-interactive-causal-agents/

4:50 AM · Jun 7, 2026 · 2.8K Views
Sentiment

Users are praising Nando de Freitas's single-stream interactive training proposal for AI agents because they find the causality take interesting and the overall direction worth trying.

Pos
100.0%
Neg
0.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS746BOOKMARKS1LIKES3REPLIES2
Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

The field of AI is at a local minimum. Not a local minimum in architectures and models, but a local minimum on how we train: a Frankenstein multi-stage approach. In this new blog entry, I propose a different route based on continual interaction and causality.

https://love4all.ai/blog/continual-interactive-causal-agents/

2hViews 746Likes 3Bookmarks 1
Pim de Witte@PimDeWitte

@Michael_J_Black @NandoDF So basically world models 😜

1hViews 41Likes 1
Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black I view all of those things as just text conditioning and steerability on a WM architecture. What you’re describing here is precisely the original promise (and reason) WMs are being pursued so hard. In case you hadn’t read yet: https://www.notboring.co/p/world-models

1hViews 8

@PimDeWitte @Michael_J_Black Precisely not. This is not about model architectures, what people often stress when talking about world models. This works with Jepa or GPT. This is about causal interactive training. It’s all about environments, not agents.

1hViews 16

@Michael_J_Black Thanks, Michael. It’s still not very developed or properly tested, but I agree that directionally, it feels worth trying

Michael Black@Michael_J_Black

@NandoDF I like this. Directionally, it feels right.

1hViews 196Likes 1Bookmarks 0
Pim de Witte@PimDeWitte

@NandoDF @Michael_J_Black P(S’|A,S) defines how the world evolves, P(S’|do(A),S) is how you train general agents inside those WMs. They supplement each other in the loop you lay out. I see your distinction though - two sides of the same coin

55mViews 22Likes 1

@NandoDF multi-stage pipeline feels like duct taping training phases together

the interventional agent is cleaner but do we have the infra to pull it off

2hViews 34
Blissy@BlissyOnX

@NandoDF the single stream feels inevitable tbh, but the gap between proposing it and making it work is where the real friction lives

2hViews 25
Strata@ChainZenit

@NandoDF this take on causality is super interesting, how did you start?

2hViews 24

@NandoDF What is the conceptual difference between stages 2 and 3, or 5 and 6?

15mViews 21
Rugbist@rugbist_

@NandoDF single stream agent approach would def reduce all the prompt engineering hell we deal with now

question is what happens to fine tuning in that setup

2hViews 19
Alex YGift@Radipdegen

@NandoDF hard to disagree that multi-stage feels like patching a leak with more patches

wonder if the compute budget holds up in practice though

2hViews 16
Invincible@InvincibleEdge

@NandoDF curious how u see continual interaction scaling in practice tho

most labs cant even keep one training run stable

2hViews 12
Guilherme O'Tina@guilhermeotina

the multi-stage pipeline is ugly but it solves a real data geometry problem. pretraining needs density (next token prediction on web text), alignment needs human preferences, rl needs environment rollouts. collapsing them into one stream means the environment has to provide all three signal types at the right ratios. that feels less like a learning algorithm problem and more like an environment design problem

1hViews 11

@PimDeWitte @Michael_J_Black P(S’|A,S) in the blog you shared is wrong. It should be P(S’|do(A),S). Our blog explains why.

1hViews 1

@PimDeWitte @Michael_J_Black This is a solution for how world models should be trained so they become proper causal agents. This solution was developed by @AdaptiveAgents and myself over the years. We are familiar with the literature. Thanks for the link.

1hViews 1