I have been getting a lot of questions about my AI ads and how to create them. Usually under these posts a discussion erupts about whether to use Seedance or Sora, whether Kling 3 is better than Kling 2.6, what about Grok Imagine, and Nano Banana 2 or Pro or GPT Image 2.
I understand why it can be overwhelming to understand all of them, but since I have been in AI from day 1, they all make sense to me now.
In this post I will break down all of the models in relation to marketing, which is what I use AI for. I am a performance marketer with ecom clients, and this is how I use each model. Including tips, tricks, and some prompting strategies.
Models covered: Sora 2 Pro, Kling 3, Kling 3 OMNI, Kling 2.6, Seedance 2, Seedance 1.5 Pro, VEO 3.1, Nano Banana 2 and Pro, Grok Imagine
Sora 2 Pro
The GOAT of AI video generation and the best model for marketing right now. It is the only model that can generate authenticity well. All of those small imperfections, shaky camera, skin realism. Every other model has that polished skin that reflects light unnaturally.
Best use case: Self-recorded camera style UGC, talking into the camera. When you need A-roll of a person talking, this is the best model for it. But it does not scale well because of the randomness. When you hit though, it is the best result by far.
It is also the most expensive model. You can get to $1 per second easily on max settings.
It is also the most random. With other models if you put in the same prompt and image 10 times, 8 out of 10 will be pretty similar. With Sora it is very hit or miss.
Prompting: This model is very specific. Compared to others it can take up to 32k tokens as a prompt. When you are writing the prompt for Sora, you are the architect not the guide. You need to describe every little detail in the video. My average Sora prompt is around 5k tokens, which is around 13k characters. More on prompting below.
Kling 3, Kling 3 OMNI, Kling 2.6
Kling models are easier to spot if your goal is realism, compared to Sora. But they have much more reliable and consistent results.
The best use case is B-rolls. If you need slight movement, like a live photo effect, this is the best model for it. But it is pretty much unusable without a starting frame.
Tip: If you want to increase the realism of color and lighting, generate a starting frame first that is already dialed in. Kling will not introduce something that is not in the starting frame.
Kling 3 is a very powerful model and can handle speaking videos, but only up to about 10 seconds. When you push it above 10 you will get broken lip sync. Split the script into multiple segments if you need longer.
Seedance 2 and 1.5 Pro
Seedance models are getting a lot of hype lately and I think it is deserved.
Seedance 1.5 Pro is best for B-rolls. I would not recommend it for talking.
Seedance 2 is really well versed. You can do B-rolls but also talking when using a starting frame.
The biggest advantage is the Seedance 2 reference mode, where you can reference up to 3 images. You can then reference them in the prompt. For example: face, product, and room. All 3 are used for the generation. Kling 3 OMNI had this first, but honestly I like the Seedance 2 approach much better.
VEO 3.1
To be honest, currently the weakest model. Can only do 8 seconds and everything looks plasticky. I would not recommend it for marketing right now.
Nano Banana Pro and 2
These two are a must and my everyday bread. For every video generation I need a starting frame, and with Nano Banana I can upload up to 15 reference images and just compose the image I want. Edits of existing images are also insane. Nano Banana produces the most realistic visuals I have ever seen from an image model.
Grok Imagine
This model has some potential and I am looking forward to the next version. Right now I would put it next to VEO 3.1. With Sora, Kling 3, and Seedance 2 all live right now, there is no real place for Grok and VEO. There is nothing you would need them for other than maybe very specific use cases, which I have not run into in marketing.
Prompting
Seedance and Kling have similar prompt frameworks. Keep it short and to the point. Do not overwhelm the model. 2,000 characters at most.
Sora is the exact opposite. You want to be as specific as possible, up to 32k tokens. You can spend 5 lines just describing the lighting. From my experience Sora benefits from this level of detail.
What to use for what
UGC with talking actor, A-roll and B-rolls with the same actor:
A-roll: Sora 2 Pro. Can go up to 20 seconds and really nails the authenticity. If you do not want to deal with the randomness, go with Kling 3 and split the script into multiple generations.
B-roll: Seedance 2. Great at physics. If you need product handling, real world actions, or lifestyle B-rolls, this is the one.
Podcast with 2 switching actors:
Kling 3. The reason you do not want Sora here is because of the randomness. Kling 3 is reliable and will keep hand gestures and mouth movements consistent across multiple generations. Sora would make them move differently every time.
Explainer or educational content with B-rolls and text overlay:
Seedance or Sora 2, depending on the subject. For body related visuals like hair growth or skin cream, Sora was much better. For larger objects like blueprint overlays or paintings coming to life, Seedance handles it better.
Bottom line
You always want to have multiple models at your disposal. If you are creating ads daily, you need B-rolls and talking. One model will not cut it.
If you are serious about AI ads, you also need: ElevenLabs for voice changing and cloning, voice enhancement to clear the AI reverb, extend video, and remove background.
I would recommend choosing a platform that has everything combined. You do not have to start like I did, buying subscriptions with everybody, endlessly exporting and downloading, then getting confused about which version had the clean voice and which had the background removed.
I personally used Arcads and Creatify but then switched to AutoReach. The built-in editor combined with ai subtitles made more sense, and one less tool to pay.
Let me know if you want the prompt frameworks for each model. They are pretty long so I did not want to put them in the post.
Also let me know about GPT Image 2 if you have experience with it. That is the model I am going to be putting to the test next.