This video is a master class in “ I told you so” delivered with a smile :)
1. Modern AI didn’t discover world models… it just industrialized ancient philosophy.
2. @SchmidhuberAI’s 1990 setup had two parts: a world model (M) that predicts everything (including pain/reward) and a controller (C) that plans action sequences inside the model’s latent space. Today’s foundation models are just the prediction machine. Without the curiosity driven controller exploiting that model, you’re still missing real intelligence.
3. Naïve prediction error curiosity gets you glued to a noisy TV forever. The fix (already working in 1991) is compression progress: reward the agent for discovering simpler explanations of the data it generates. Scientists, artists, and babies already run on this. Most modern RL agents still don’t.
4. Fast-weight programmers (linear transformers), pre-training (the P in ChatGPT), neural distillation/collapse, deep residual learning, and predictability maximization in latent space — all came out of @SchmidhuberAI’s lab decades before the Transformer paper or ResNet. The 2020s boom is mostly re-implementing 1990s ideas on 10-million-times-cheaper hardware.
5. Controller + world model fused together via continual distillation, planning via RL prompt engineering in abstract latent space, and curiosity measured by compression progress. Schmidhuber described the blueprint in 2015–2018. DeepSeek’s recent shock the market trick? Looks suspiciously like exactly that.
Video of my opening keynote for the 2026 World Modeling Workshop at Mila, Quebec AI Institute: simple but powerful ways of using world models and their latent space (sorry for sound problem 0:20-0:43). Details and references in: The Neural World Model Boom https://people.idsia.ch/~juergen/world-model-boom.html