u/AmbientCreator

Gemini 2.5 Flash selects better short clips from long videos than I do manually - here's why

I've been testing using Gemini to analyze transcripts and identify the highest-retention moments from long-form content before cutting Shorts.

The prompt asks it to find moments with: a strong hook in the first sentence, a complete idea under 60 seconds, and an emotional peak or surprising statement.

Compared to my manual selection: Gemini's picks average 63% retention vs ~48% from gut instinct. Tested across 700+ videos on 4 channels.

The bottleneck isn't the AI. it's the FFmpeg pipeline after. Happy to share the full workflow if useful.

reddit.com
u/AmbientCreator — 15 days ago
▲ 16 r/n8n

Built this pipeline to run 4 ambient YouTube channels automatically.

4 stages:

- PROMPTS: generates video concepts via Gemini

- IMAGENES: Gemini Imagen 4 generates frames

- VIDEOS: Veo 3.1 via Kie.ai + Pixabay API as real footage fallback

- MUSICA: Suno Pro generates the soundtrack

FFmpeg handles the 3-hour render loop.

YouTube Data API handles upload with metadata.

Cost per video: ~$0.30 in API credits

Output: 714 videos published across 4 channels so far

Workflow JSON:

https://gist.github.com/carismaxinfo-oss/ea8d1519c6539a067d5685f2b3798e7b

Happy to answer questions about specific nodes.

reddit.com
u/AmbientCreator — 16 days ago

Been running 4 ambient channels for about 6 months. Here's what the data actually shows.

Same pipeline, same production quality, completely different retention:

| Title | Retention |

|---|---|

| "432Hz Healing DNA Repair" | 0.43% |

| "Amalfi Coast at Sunrise" | ~14% |

| "Tibetan Monastery at Dawn 3,800m" | 25.19% |

| "Colosseum Underground — Where Gladiators Waited" | 50.16% |

The pattern is always the same: specific real place + specific moment + unusual detail = people stay. Generic wellness keywords = instant exit, every single time across 76 meditation videos.

I stopped using frequency numbers entirely. Now the formula is: [Location] at [Time of Day] | [Emotional Hook] | 3 Hours

One more thing that surprised me: for dark academia, adding "SECRET" or "FORBIDDEN" to the title jumped retention from 3-5% to 22.44%. Same video, same production. Just the title hook.

Happy to answer questions about niche selection or the data.

reddit.com
u/AmbientCreator — 17 days ago

Been running this for about 6 months across 4 channels. Sharing the stack because I haven't seen this exact combination documented anywhere.

- Image generation: Gemini Imagen 4 (6 images per video concept)

- Video clips: Veo 3.1 via Kie.ai API (3 clips) + Pixabay real footage (3 clips) as fallback

- Music: Suno Pro API → looped to 3 hours with FFmpeg crossfade

- Rendering: FFmpeg concat + zoompan for Ken Burns on stills

- Upload: YouTube Data API v3 with automated metadata

- Automation: Python + n8n

Output is a 3-hour ambient video at 1280x720 for about $0.30 in API costs.

One thing worth sharing: early versions used Ken Burns zoom starting at 1.0. The result looked completely static because the zoom range was too small to perceive. Switching to Veo 3.1 for half the clips and real Pixabay footage for the other half made a huge difference in perceived quality.

FFmpeg flag that took me too long to find: z='min(1.05+on*0.0003,1.25)' - starting from 1.05 instead of 1.0 prevents the vibration artifact where zoom resets between clips.

Happy to answer questions about any part of the stack.

reddit.com
u/AmbientCreator — 17 days ago