@EthanHe_42 worked with me and @imhaotian for half a year on Grok Imagine. I don't think he was being intentional overclaiming here, but the internet narrative quoting him as the "lead" was werid. It was really driven by @imhaotian most of the time.
Good times β especially that intense three-month sprint at the start. Huge credit goes to @imhaotian and several other key people like @zeliu_ @jathushan @ZhibeiM @hexiang @JackCaiXun @chaitu, and later @jia_xuhui @YknZhu, along with many others β especially the latest Grok Imagine 1.5.
In @latentspacepod podcast, I shared my view on video generation, world models, LLMs, agents, continual learning and where the next frontier is.
1. Video models get most of their intelligence from language, not from video data. 2. Idea-to-code is fast now. The bottleneck is back to having enough compute to try every idea. 3. Iteration speed beats almost everything else in model development. 4. The next leap won't be a better video model. It'll be a video agent. 5. Diffusion will be the frontend of AGI, the LLM the backend. Generative UI will replace HTML/CSS: user intent straight to pixels. 6. Physical embodiment may become a tool a powerful AI picks up. Robotics may get solved by video-capable LLMs. 7. Continual learning may look like models that manage their own context, and even rewrite their own harness at test time. Thanks @swyx and @vibhuuuus for having me π https://www.youtube.com/watch?v=jPtQlILfkhA