Still, no Western text-to-video model comes close to Seedance 2.0, and Seedance 2.5 is already ready.
There are certainly several explanations for this. One of them is that it is often at least claimed that this is because Seedance has access to such a vast amount of video material and does not take copyright protection all that seriously. That is a vague assumption, and honestly, I cannot imagine it being the only reason. Google, in turn, has YouTube, a platform with countless videos that could surely be used to train good models. Just remember when Mira Murati was asked how they had trained Sora and whether YouTube videos had been used for it.
Be that as it may, the more questionable issue is why there seems to be so little interest and focus on video models. My assumption is that they are simply not relevant. They are basically a nice gimmick, but currently negligible in the race for the best models. More specifically, the focus on LLMs, which are making outstanding progress in important areas such as SWE, is simply so much more important for winning overall that one would not use compute for video models instead. OpenAI is known to have completely ended Sora for the moment.
Maybe the more important point is that consumer video is probably not the real endgame for AI video models. Yes, they are useful for creators, ads, short-form content and entertainment, and for ByteDance this obviously fits perfectly into CapCut, Dreamina and TikTok. But strategically, the bigger reason to train these systems may be that video is one of the richest training signals we have for learning the dynamics of the physical world: motion, causality, object permanence, spatial consistency and interaction. In that sense, video models are not just content generators, but early world models (Google, NVIDIA). Or in short: for Western labs, AI video segments for the consumer sector are too cost-inefficient with too little real benefit.
That is why I think we are currently seeing hardly any change in this area.
Coming soon: Dreamina Seedance 2.5 is arriving on CapCut.
Seamless generation and editing. Up to 50 multimodal references. 30-second scenes in one shot. Finer creative control. More reliable results. It's built to make creating faster, smoother, and more intuitive.
Whether you're creating animations, short dramas, social content, marketing videos or something entirely new, the next generation of AI video creation is almost here.
And it's coming to CapCut across Web, Desktop, and Mobile.
Stay tuned.














