spent an hour with Gemini Omni in YouTube Shorts, the "conversational edit" thing is the actual story (not the model quality)
so I spent about an hour yesterday playing with Omni in Shorts after the I/O drop and I'm still not totally sure what to make of it.
going in I assumed it was going to be another text-to-video model where you prompt, wait, get something close-ish, re-prompt, wait again, eventually rewrite the prompt for the fifth time. that's been my loop with every video tool since Veo. tried Kling, tried Seedance, same painful cycle every time.
Omni does a thing I don't think people are talking about enough. you generate a clip and then you just talk to it. "pull the camera back." "change the background to a kitchen." "keep her face the same but make her look annoyed." it actually keeps the character across the edit. it doesn't regenerate from scratch and give you a different person who kinda resembles the last one.
now I'm not going to sit here and tell you the frame quality beats Seedance 2.0 because it doesn't. the testers calling it a tier below are right, you can see it especially on faces in motion. that's real and I won't pretend otherwise.
but the thing I think actually matters for creativity, and not for benchmark posts, is this. the "prompt and pray" loop has been the bottleneck. not the model quality. if I can direct a mediocre model the way an editor directs an actor, I'll take that over a beautiful model where I have to bribe it with the perfect prompt 40 times to get one usable shot.
also the YouTube Shorts Remix thing where you step into someone else's Short for free is kind of insane as a distribution move? Google just dropped a generative video tool in front of a few billion people without a paywall. that's going to move AI video adoption more than any benchmark bump.
stuff I haven't figured out yet:
- 10 second cap bites fast, you feel it on the second prompt
- no API, so if you wanted to actually build on this you can't (yet)
- the avatar feature where you become a character was fun for about 4 minutes and then I started feeling slightly weird about it, idk
genuinely curious if anyone here has been on it long enough to find where it breaks. specifically wondering how multi-shot consistency holds past 3-4 clips, because that's where Veo always falls apart on me.