I built an AI Video Generator that actually thinks like a Film Director.
I’ve been experimenting with AI video generators, and honestly, most of them output the exact same cheesy, generic stuff. If your script says "You are the architect of your own ruin," the AI will literally insert a stock clip of an architect drawing blueprints. It completely breaks immersion and feels like a bad corporate ad.
I wanted to make high-quality, moody, psychological videos, so I decided to build my own end-to-end pipeline using Node.js, React (Remotion), and multiple LLM Agents.
Instead of basic text-to-video matching, I built a "Visual Director" agent programmed with a philosophy I call "Poetic Literalism."
Here’s how it works:
- The Script Agent: Writes an emotionally paced script based on a psychological topic.
- The Director Agent: Reads the script and maps the emotion to cinematic B-roll metaphors instead of literal words.
- "You are trapped in your mind" -> Searches Pexels for
bird in dark cage - "You ruin your life" -> Searches for
falling dominoes darkorshattering glass - "Swimming against the current" -> Searches for
turbulent dark river
- "You are trapped in your mind" -> Searches Pexels for
- The Orchestration: The agent actively manages "tension and release." It builds tension with dark, claustrophobic architecture shots, and then drops a massive, sweeping nature shot right at the moment of philosophical realization.
- The Render Engine: It fetches all the unique clips, syncs them to the audio timestamps, and renders the final video natively using Remotion.
I even built a local override script so if the agent picks a clip I don't love, I can just type the segment number in my terminal, delete the video, and the script will automatically blacklist the old video ID and fetch a brand new cinematic metaphor.
It went from looking like cheap AI sludge to feeling like a heavy, moody documentary. Has anyone else tried forcing AI agents to think purely in B-roll metaphors? It completely changes the output quality.