wrote an ansible playbook that provisions a video transcript search tool on a fresh ubuntu VM in about 4 minutes
i work at an MSP and we have about 180 youtube videos. recorded knowledge transfer sessions, vendor training walkthroughs, internal runbook recordings, client onboarding demos. all shared through a teams channel where the links get buried in message history within a week. every time someone new joins the team the question is always "where are the training videos" and the answer is "scroll up in teams" which is useless.
i built a small internal tool that makes the videos searchable by what was actually said in them. flask app with a postgres backend using full text search. one search box, results come back with the video title, date, and a snippet of the transcript around the match. simple stuff.
the part i wanted to get right was the deployment. we spin up VMs for internal tools regularly and i didn't want this to be another snowflake that someone set up manually and nobody can recreate. so i wrote an ansible playbook that takes a fresh ubuntu 22.04 VM and gets the whole thing running.
the playbook does:
- installs postgres, python3, pip, nginx, nodejs
- creates the postgres database, user, and the tables with the tsvector column and GIN index
- copies the flask app and the ingestion script to the server
- installs the python dependencies with pip into a venv
- sets up a systemd service for the flask app running behind gunicorn
- configures nginx as a reverse proxy
- runs the initial transcript ingestion
the ingestion step uses transcript api to pull the transcripts:
npx skills add ZeroPointRepo/youtube-skills --skill youtube-full
the playbook calls the ingestion script with ansible.builtin.command which reads urls from a file and processes them. the whole playbook is about 120 lines of yaml across 3 roles. postgres, app, and nginx.
the thing that made it worth doing properly was the first time a colleague needed to set up the same tool for a different team. he ran the playbook against a new VM, changed the urls file, and had it running in 4 minutes. no documentation to follow, no steps to miss, no "did you remember to create the postgres user" messages in slack.
about 180 videos indexed. the MSP team uses it to find specific vendor training videos before client calls. the onboarding team uses it to point new hires at specific recordings. the playbook has been run 3 times now on 3 different VMs for 3 different teams.