u/scheemunai_

wrote an ansible playbook that provisions a video transcript search tool on a fresh ubuntu VM in about 4 minutes

i work at an MSP and we have about 180 youtube videos. recorded knowledge transfer sessions, vendor training walkthroughs, internal runbook recordings, client onboarding demos. all shared through a teams channel where the links get buried in message history within a week. every time someone new joins the team the question is always "where are the training videos" and the answer is "scroll up in teams" which is useless.

i built a small internal tool that makes the videos searchable by what was actually said in them. flask app with a postgres backend using full text search. one search box, results come back with the video title, date, and a snippet of the transcript around the match. simple stuff.

the part i wanted to get right was the deployment. we spin up VMs for internal tools regularly and i didn't want this to be another snowflake that someone set up manually and nobody can recreate. so i wrote an ansible playbook that takes a fresh ubuntu 22.04 VM and gets the whole thing running.

the playbook does:

installs postgres, python3, pip, nginx, nodejs
creates the postgres database, user, and the tables with the tsvector column and GIN index
copies the flask app and the ingestion script to the server
installs the python dependencies with pip into a venv
sets up a systemd service for the flask app running behind gunicorn
configures nginx as a reverse proxy
runs the initial transcript ingestion

the ingestion step uses transcript api to pull the transcripts:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the playbook calls the ingestion script with ansible.builtin.command which reads urls from a file and processes them. the whole playbook is about 120 lines of yaml across 3 roles. postgres, app, and nginx.

the thing that made it worth doing properly was the first time a colleague needed to set up the same tool for a different team. he ran the playbook against a new VM, changed the urls file, and had it running in 4 minutes. no documentation to follow, no steps to miss, no "did you remember to create the postgres user" messages in slack.

about 180 videos indexed. the MSP team uses it to find specific vendor training videos before client calls. the onboarding team uses it to point new hires at specific recordings. the playbook has been run 3 times now on 3 different VMs for 3 different teams.

u/scheemunai_

wrote an ansible playbook that provisions a video transcript search tool on a fresh ubuntu VM in about 4 minutes

the terraform for my video transcript search tool took longer to write than the actual application

wrote a neovim plugin that pulls youtube video transcripts into buffers and searches them with telescope

wrote a perl script that indexes youtube video transcripts into sqlite and a mojolicious app to search them. one script, one app, no nonsense

wrote a small servant app that searches youtube video transcripts and i'm weirdly pleased with how the types turned out

wrote a small http server in zig that searches across youtube video transcripts stored in sqlite and the binary is 1.2MB

built a small http4s app that indexes youtube transcripts into postgres and the whole thing is about 350 lines of scala

wrote a phoenix liveview app that searches across youtube video transcripts and the real time search feels absurdly good

built a ktor api that indexes youtube video transcripts and our team uses it more than any internal tool i've made