Who else is hyped about merging 3D and AI for video creation?
I'm dreaming of an AI creation tool that builds a unified 3D scene graph underpinning your story world, with diffusion applied on top like a final render pass or fancy coat of paint.
This 'dynamic' 3D scene graph would encapsulate characters, environments, and interactions. Toss it all into a large context window, and use multimodal prompting to edit intuitively and breathe life into it.
You'd be in 'director mode,' commanding your 'AI talent' like folks are doing with ChatGPT's advanced voice for improv or line delivery.
Guide your virtual 'talent', capture the best takes in a digital studio, then switch to 'editing mode' to piece it all together.
The kicker? You'd retain the flexibility to go back and make changes on a per-shot or world level.
You get all the perks of a 3D engine - cause/effect relationships, physics simulations, spatially-anchored audio - while still transforming your 'grey-boxed' world into stunning final pixels.
Eventually, you'd don an Apple Vision Pro to orchestrate the whole thing - feeling like James Cameron on the set of Avatar.
Critically, such a tool would put you in a flow state, versus chaotically jumping between an ill-tailored tapestry of tools (which is the real multiverse of madness) leaving the current set of AI video tools better suited for making memes than telling stories.
Anyone actually building this?