u/Impressive-Manner963

Guys, a lot of people are getting frustrated with Gemini Omni right now, and honestly I think it’s because people are still trying to use the same prompting style we use for VEO 3.1, Seedance, Kling, and all the other tools.

But Omni came with a completely different approach. It works much more like a conversation model.

It doesn’t fully interpret descriptive prompts the same way we’re used to anymore. Instead, it seems to work through physical interpretation and world translation.

For example: imagine a car driving at 200 km/h.In older models, you would simply write:“The car is driving at 200 km/h.”

But Omni doesn’t really understand that kind of abstract numeric information the same way.

Instead, you need to describe the physical sensation of the scene:

how the environment behaves
how fast objects move past the camera
vibration
wind pressure
motion blur
suspension movement
the feeling of speed itself

Basically, you now have to translate human concepts into physical world behavior.

And I think that’s exactly why so many people are struggling right now. Videos are coming out broken, inconsistent, weird, or simply not working properly because they’re still prompting the old way.

After reading a lot of Google and DeepMind documentation and doing several research sessions comparing outputs between Claude, GPT, Qwen, Perplexity, and other sources, this was the main pattern I discovered:

You need to translate human data into physical data.

GEMINI OMNI LANGUAGE