Nando de Freitas unveils interventional SFT for agent models
Nando de Freitas, vice president of AI at Microsoft, outlined interventional SFT to stop agentic language models from self-confirming delusions. The method modifies supervised fine-tuning to ignore action tokens and train only on observations. Experiments with over 30 prompts yielded better factual selection than conventional approaches. Supporting code and examples appear on love4all.ai and GitHub.
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL.
https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀

The road here wasn’t easy. It started with our work on delusions with @AdaptiveAgents @ShaneLegg @scott_e_reed and many other bright scientists:
But instead of counterfactual learning, the theory of international imitation as a route to agency provided the foundation:
The research was accelerated by @OpenAI GPT5.5 and Codex. When I ran out of Pro credits 😅 I switched to @AnthropicAI Claude. I wish there were special LLM licenses for academic work @gdb @sama @DarioAmodei 🙏
The bottleneck for research these days is computational resources/energy. I’m glad that startups like @cusp_ai are addressing the energy challenges.
This research was possible thanks to my @CIFAR_News fellowship - the 🇨🇦 gift that keeps on giving - and my adjunct/associated professorships @UBC_CS and @CompSciOxford
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files
@AdaptiveAgents @ShaneLegg @scott_e_reed Typo: universal imitation, not international imitation 😅 🌌🌍
The road here wasn’t easy. It started with our work on delusions with @AdaptiveAgents @ShaneLegg @scott_e_reed and many other bright scientists: https://arxiv.org/pdf/2110.10819 But instead of counterfactual learning, the theory of international imitation as a route to agency provided the foundation: https://adaptiveagents.org/universal_ai_as_imitation The research was accelerated by @OpenAI GPT5.5 and Codex. When I ran out of Pro credits 😅 I switched to @AnthropicAI Claude. I wish there were special LLM licenses for academic work @gdb @sama @DarioAmodei 🙏 The bottleneck for research these days is computational resources/energy. I’m glad that startups like @cusp_ai are addressing the energy challenges. This research was possible thanks to my @CIFAR_News fellowship - the 🇨🇦 gift that keeps on giving - and my adjunct/associated professorships @UBC_CS and @CompSciOxford
Very excited about this! Just fine-tune on the observation tokens and ignore the action ones to treat the agent's output as a causal intervention.
This is one of those moments when I'm surprised the maths works in practice 😅.
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files
Very excited about this! Just fine-tune on the observation tokens and ignore the action ones to treat the agent's output as a causal intervention.
This is one of those moments where I'm surprised the maths works in practice 😅.
One line of code is all it takes to prevent LLM agent delusions, instead of post-training patches like RL. https://love4all.ai/blog/why-it-is-important-to-understand-causality-and-agency/ ❤️ 4 ∀ https://github.com/nandodef/love4all-ai/tree/main/docs/files