/AI8h ago

Ethan He, former xAI world model lead, compares current video AI to early autocomplete and predicts LLMs will control interactive video environments

He also detailed how xAI developed Grok Imagine

--0--
Original postswyx#214
Latent.Space@latentspacepod

🆕Grok Imagine’s Video Agent Moment: Cosmos, xAI, World Models, Generative UI, & the Codex Phase for Video!

https://www.latent.space/p/video-agents

@EthanHe_42, former @xai world model lead and @nvidia Cosmos researcher, explains why AI video may follow the same path as coding agents, how Grok Imagine went from zero to one, why text-to-video is only the autocomplete phase, how world models become real-time and interactive, why language models may become the control layer for video, and why the future of AI video may look less like a prompt box and more like an agent with a camera, editor, timeline, and tool belt.

8:45 AM · Jun 1, 2026 · 28.4K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS29KBOOKMARKS140LIKES132RETWEETS8REPLIES25
swyx@swyx

This pod was an incredible gift to the community:

not only our first pod about @xAI, but Ethan really indulged on all our questions on how to train a SOTA Videogen world model, including specific areas (consistent extending/editing, voice) that Grok @Imagine is *still* SOTA,

on top of the factual overviews he ALSO came loaded with opinions/predictions:

- why he's quitting Videogen for LLMs: video models get most of their intelligence from LLMs, not from scaling video data - why the next frontier for videogen also happens to be video agent models - agentic models trained to orchestrate video models - why deterministic compression (like MP4) is a useless target vs VAE compression - Videomaxxing: if you truly believe in the "Moore's law" of AI/genmedia, then video models become the final boss UI of everything, like Flipbook (below)

Latent.Space@latentspacepod

🆕Grok Imagine’s Video Agent Moment: Cosmos, xAI, World Models, Generative UI, & the Codex Phase for Video!

https://www.latent.space/p/video-agents

@EthanHe_42, former @xai world model lead and @nvidia Cosmos researcher, explains why AI video may follow the same path as coding agents, how Grok Imagine went from zero to one, why text-to-video is only the autocomplete phase, how world models become real-time and interactive, why language models may become the control layer for video, and why the future of AI video may look less like a prompt box and more like an agent with a camera, editor, timeline, and tool belt.

8hViews 29KLikes 132Bookmarks 140