You can bootstrap your agent quickly with the Omni API using the skill we published:
https://github.com/google-gemini/gemini-skills
It includes:
- video editing - text to video - video generation with image references - first frame to video
But it also has some helper tools for:
- prepping input videos for editing (10s, 720p) - audio stripping if you want to generate new audio - video inspection
Omni Flash is a smart model. The way the hand is wet, the water ripples, the refraction, the shadows, the sound effects 🤯
> Change the table to be a shallow pool of water
I'm excited to see what y'all build now it's available in the API. The edit capabilities of this model were made for cool pipelines.



