Users are praising Microsoft's Mirage and Latent Spatial Memory technique for video world models because it delivers major efficiency gains like much faster generation with far lower memory use.
Microsoft Research introduces Mirage
Latent spatial memory stores 3D scenes directly as latent tokens, skipping the costly RGB render-and-reencode loop. The result is up to 10.57x faster video generation, 55x lower memory use, and state-of-the-art consistency on WorldScore.

paper: https://huggingface.co/papers/2606.09828

Paper: https://paperswithcode.co/paper/2606.09828
Project: https://aka.ms/latent-spatial-memory/
Code: https://github.com/microsoft/LatentSpatialMemory

@_akhaliq Thanks @_akhaliq ! Author thread: Explore more at: https://aka.ms/latent-spatial-memory

@_akhaliq Real question is whether this scales beyond demos or hits the same wall as every other world model.

@HuggingPapers Mirage stores 3D scenes as latent tokens, skipping the costly RGB render-and-reencode loop. 10.57x faster generation, 55x lower memory, state-of-the-art consistency on WorldScore. #MicrosoftResearch

@_akhaliq Please check out the SO-101 benchmark I posted!

@Rangfeng1117 @_akhaliq You can try it after we release code and ckpt!