u/scheemunai_

wrote an ansible playbook that provisions a video transcript search tool on a fresh ubuntu VM in about 4 minutes

i work at an MSP and we have about 180 youtube videos. recorded knowledge transfer sessions, vendor training walkthroughs, internal runbook recordings, client onboarding demos. all shared through a teams channel where the links get buried in message history within a week. every time someone new joins the team the question is always "where are the training videos" and the answer is "scroll up in teams" which is useless.

i built a small internal tool that makes the videos searchable by what was actually said in them. flask app with a postgres backend using full text search. one search box, results come back with the video title, date, and a snippet of the transcript around the match. simple stuff.

the part i wanted to get right was the deployment. we spin up VMs for internal tools regularly and i didn't want this to be another snowflake that someone set up manually and nobody can recreate. so i wrote an ansible playbook that takes a fresh ubuntu 22.04 VM and gets the whole thing running.

the playbook does:

  • installs postgres, python3, pip, nginx, nodejs
  • creates the postgres database, user, and the tables with the tsvector column and GIN index
  • copies the flask app and the ingestion script to the server
  • installs the python dependencies with pip into a venv
  • sets up a systemd service for the flask app running behind gunicorn
  • configures nginx as a reverse proxy
  • runs the initial transcript ingestion

the ingestion step uses transcript api to pull the transcripts:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the playbook calls the ingestion script with ansible.builtin.command which reads urls from a file and processes them. the whole playbook is about 120 lines of yaml across 3 roles. postgres, app, and nginx.

the thing that made it worth doing properly was the first time a colleague needed to set up the same tool for a different team. he ran the playbook against a new VM, changed the urls file, and had it running in 4 minutes. no documentation to follow, no steps to miss, no "did you remember to create the postgres user" messages in slack.

about 180 videos indexed. the MSP team uses it to find specific vendor training videos before client calls. the onboarding team uses it to point new hires at specific recordings. the playbook has been run 3 times now on 3 different VMs for 3 different teams.

reddit.com
u/scheemunai_ — 13 hours ago

the terraform for my video transcript search tool took longer to write than the actual application

i work at a consulting firm and we needed an internal tool to search across about 200 youtube videos by what was said in them. recorded client workshops, internal tech talks, vendor demos, the usual. i figured i'd build something simple and provision it properly with terraform from the start instead of clicking around in the console and regretting it later.

the app itself is straightforward. a python lambda that takes a search query, runs it against a postgres RDS instance with full text search, and returns matching videos with transcript snippets. an API gateway HTTP api in front of it. a static S3 bucket with cloudfront for the html frontend. one search box, results show up below.

the ingestion side is a separate lambda triggered by an SQS queue. drop a youtube url into the queue and the lambda pulls the transcript, parses it, and inserts it into postgres. for the transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the application code across both lambdas is maybe 120 lines of python. the terraform is 380 lines.

the RDS module alone is about 80 lines. the instance, subnet group, security group, parameter group for the postgres FTS config, the secret in secrets manager for the credentials, the IAM policy for the lambda to read the secret. then the lambda module for each function with the execution role, the log group, the VPC config so it can reach RDS, the security group for the lambda. API gateway with the integration, stage, and route. S3 bucket with the bucket policy, cloudfront distribution with the OAC, the ACM cert for the custom domain, the route53 record.

every one of those resources is maybe 5-10 lines of HCL but there are 30+ resources and they all reference each other. the security group for the lambda allows outbound to the security group on RDS. the lambda role needs the secrets manager policy and the VPC execution policy. the API gateway needs the lambda invoke permission. it's not complicated individually, it's just a lot of wiring.

the part that took the longest was the cloudfront + S3 + OAC setup. origin access control replaced OAI and the bucket policy has to reference the cloudfront distribution ARN which creates a circular dependency if you're not careful. ended up using a data source for the cloudfront distribution in the bucket policy to break the cycle.

i spent maybe 2 hours on the python and 6 hours on the terraform. the ratio feels wrong but the infrastructure is now reproducible, documented, and i can tear it down and rebuild it with one command. the python would have been the same 2 hours whether i used terraform or clicked through the console. the terraform just front-loads the pain.

about 200 videos indexed. the consultants use it daily before client calls. monthly AWS cost is about $18, mostly the RDS instance.

reddit.com
u/scheemunai_ — 1 day ago
▲ 1 r/neovim

wrote a neovim plugin that pulls youtube video transcripts into buffers and searches them with telescope

i work at a startup and we record a lot of stuff on youtube. engineering deep dives, product demos, design reviews, recorded standups when someone can't make it. about 170 videos at this point. finding anything meant opening youtube and hoping you remembered the title or asking in slack if anyone remembered which video covered a specific topic.

i wanted to search through them from neovim so i built a plugin for it.

the plugin has two commands. :TranscriptFetch takes a youtube url, pulls the transcript, and saves it as a markdown file in a configurable directory. the file has yaml front matter with the title, date, speaker, tags, and youtube url, followed by the full transcript text. :TranscriptSearch opens a telescope picker that searches across all transcript files with live grep previewing matches in the preview window.

the fetch part calls transcript api to get the raw transcript:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the lua code uses vim.fn.system to call curl against the api, then vim.json.decode to parse the response. it writes the file with vim.fn.writefile. the entire fetch module is about 40 lines of lua.

the telescope integration was the fun part. i wrote a custom picker that uses live_grep scoped to the transcript directory. as you type, telescope fuzzy matches across all transcript files and the preview window shows the file with the match highlighted. select a result and it opens the buffer with the cursor on the matching line. from there i can read the context around the match and if i need to watch the actual video, i have a keymap that reads the youtube url from the front matter and opens it with vim.ui.open.

i also added a telescope picker that lists all transcripts sorted by date and lets you filter by tags from the front matter. uses the pickers.find_files source with a custom entry maker that displays the title and date instead of the filename.

the plugin is maybe 150 lines of lua across three files. init.lua for the setup and commands, fetch.lua for the api call and file writing, telescope.lua for the pickers.

about 170 transcripts in the directory now. i batch-imported them with a shell script that reads urls from a file and calls :TranscriptFetch in a loop through nvim --headless. the team doesn't use neovim so i'm the only one using the plugin, but i search through these transcripts multiple times a day before standups and design reviews. knowing what was said in previous discussions before starting a new one saves a lot of repeated conversations.

reddit.com
u/scheemunai_ — 1 day ago
▲ 14 r/perl

wrote a perl script that indexes youtube video transcripts into sqlite and a mojolicious app to search them. one script, one app, no nonsense

i work at an engineering firm and we have about 150 youtube videos. recorded lunch and learns, vendor product demos, safety training sessions, project retrospectives people recorded for the next team. everything is unlisted and shared through a confluence page with links sorted by date. finding anything means scrolling through a list of links with titles like "Recording - March 14" and hoping you click the right one.

i wrote a perl script to make them searchable last friday afternoon.

the ingestion script takes a text file of youtube urls. for each url it pulls the full transcript, then inserts it into a sqlite database along with the video title, date, presenter, tags, and the youtube link. the database has a regular table for metadata and an FTS5 virtual table for the transcript text.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the script uses LWP::UserAgent to call the api and JSON::PP to parse the response. DBI with DBD::SQLite for the database. the insert does both tables in a single transaction. the whole ingestion script is about 60 lines and most of that is the argument parsing with Getopt::Long.

the search app is mojolicious::lite. one route for the search page, one route for the query. the query runs a MATCH against the FTS5 table and uses snippet() to pull the transcript excerpt around the match. results come back with the video title, date, presenter, and the snippet with the match highlighted. the template is inlined in the script with DATA so the entire app is one file.

morbo search.pl

and it's running. deployed it on a linux box we use for internal tools by switching morbo for hypnotoad. the whole app including the template is maybe 80 lines.

the thing i appreciate about this project is that it's exactly the kind of problem perl was made for. take some data from an api, shove it in a database, put a small web interface on it. no framework decisions, no dependency management headaches, no build step. LWP, DBI, Mojo, JSON. stuff that's been stable for years and will keep working without me touching it.

about 150 videos indexed. the engineers use it before project kickoffs to find out if a previous team recorded a retrospective on a similar project. the safety team uses it to verify specific topics were covered in training recordings. one of the senior engineers told me it saved him from rewatching a 2 hour vendor demo because he searched for the specific integration he needed and found the 3 minute section where the vendor covered it.

reddit.com
u/scheemunai_ — 2 days ago
▲ 11 r/haskell

wrote a small servant app that searches youtube video transcripts and i'm weirdly pleased with how the types turned out

i work at a legal tech company and we have about 190 youtube videos. recorded CLE presentations, product walkthroughs for attorneys, internal engineering talks, some webinar recordings from conferences. the attorneys kept asking "do we have a video that explains X" and nobody had a fast way to answer because the videos are titled things like "Webinar Recording - Nov 2024."

i built a search tool for it in haskell over a weekend. mostly because i wanted a real project to work on outside of the toy stuff i've been doing while learning.

the api is defined with servant. two endpoints. one serves the static html page, the other takes a query parameter and returns search results as json. the result type is a record with the video title, date, speaker, transcript snippet, and youtube url. servant gives me the type-safe routing and the aeson instances get derived generically for the response type. the whole api definition and server are maybe 60 lines.

the database layer uses postgresql-simple. i wrote a search function that takes a connection and a query string and returns [SearchResult]. the query uses postgres full text search with tsvector on the transcript column and ts_headline for the snippet. the raw sql is in a quasiquoted string which isn't beautiful but it works and the function signature keeps everything honest. the db module is about 40 lines.

for pulling the transcripts i wrote a separate ingestion binary. it reads urls from a file, calls transcript api for each one:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the ingestion tool uses http-conduit for the api calls and aeson to parse the json response. each transcript gets inserted into postgres with the metadata. about 50 lines for the ingestion tool.

the part i enjoyed was defining the SearchResult type and having everything downstream just work. the aeson ToJSON instance is derived, servant uses it to serialize the response, and the postgresql-simple FromRow instance maps the query results. once the types compiled i didn't have a single runtime error related to data shape. the only bugs were in the sql query itself which is the one place haskell can't help you.

deployed it with a nix flake that builds the executable. copied it to our internal server, set up a systemd service, done. about 190 videos indexed. the legal team uses it to find CLE recordings by topic. one attorney searched for "fiduciary duty" and found 6 videos she didn't know existed. the engineering team uses it before architecture discussions to check if someone already presented on the approach.

the codebase is three modules and a Main. maybe 200 lines of haskell total. not counting the cabal file which is honestly longer than some of the modules.

reddit.com
u/scheemunai_ — 4 days ago
▲ 53 r/Zig

wrote a small http server in zig that searches across youtube video transcripts stored in sqlite and the binary is 1.2MB

i've been learning zig for a few months mostly through small projects and wanted to build something i'd actually use at work. i work at a dev tools startup and we have about 140 youtube videos. product demos, engineering deep dives, recorded design reviews, conference talks. the usual problem where nobody can find anything because video titles are meaningless.

so i wrote a search tool for it in zig.

the server uses std.http.Server. one GET endpoint that takes a query parameter and returns json results. each result has the video title, date, speaker, a snippet of the transcript around the match, and the youtube link. there's also a static file handler that serves a single html page with a search box. the html is embedded in the binary with u/embedFile so there's nothing to deploy except the binary and the sqlite database.

the sqlite part uses the zig-sqlite wrapper from vrischmann. the database has one table for video metadata and an FTS5 virtual table for the transcripts. the search runs a MATCH query on the FTS5 table and uses snippet() for the excerpt. queries come back in under 5ms for 140 videos.

for pulling the actual transcripts i wrote a separate ingestion tool. it calls transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the ingestion tool is a zig executable that takes a text file of youtube urls, calls the api for each one using std.http.Client, parses the json response with std.json, and inserts into sqlite. both the FTS5 table and the metadata table. maybe 200 lines for the ingestion tool.

the server itself is about 350 lines. the part i enjoyed most was the allocator discipline. the request handler uses an arena allocator that gets reset after each response, so there's no per-request allocation overhead piling up. coming from python where i never think about memory this was a different way of working. not harder exactly, just more deliberate.

the final binary is 1.2MB statically linked. i copied it to our internal tools server along with the sqlite file and that was the deployment. no runtime, no container, no dependencies. it starts in about 4ms which i know because i timed it out of curiosity.

the team uses it a few times a day. mostly before design reviews to check if someone already presented on the approach being proposed. one of the senior engineers started using it to find his own past talks which is a use case i didn't think of.

reddit.com
u/scheemunai_ — 7 days ago
▲ 10 r/scala

built a small http4s app that indexes youtube transcripts into postgres and the whole thing is about 350 lines of scala

i work at a fintech company and we have about 170 youtube videos. internal engineering talks, recorded architecture reviews, vendor integration walkthroughs, a bunch of conference talks people bookmarked over the years. all sitting in a shared playlist that nobody scrolls through because the titles are useless. someone would ask in slack "didn't we do a talk about event sourcing last year" and nobody could find it.

i built a small app to make them searchable. http4s server with a single endpoint that takes a query string and returns matching videos with transcript snippets. postgres backend with full text search. a simple html frontend with one search box because i didn't want to deal with a separate frontend project.

for pulling the transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the ingestion side is a cli tool built with decline. give it a youtube url or a file of urls and it pulls each transcript through the api, parses it, and inserts it into postgres using doobie. i wrote a TranscriptRepo algebra with an insert and a search method. the search uses postgres tsvector with a GIN index and ts_headline for the snippet extraction.

the http4s routes are one file. GET /search with a query param, returns json or html depending on the accept header. i used circe for the json encoding. the html response is just a scalatags template inlined in the route because i didn't want to set up a template engine for one page.

the part that felt nice was how well doobie handles the postgres full text search queries. the tsvector and tsquery types map cleanly and i wrote a custom fragment for the ts_headline call. no raw string interpolation, just composed fragments. it reads well.

deployed it on our internal k8s cluster as a docker container. the docker image is built with sbt-native-packager. the whole app starts in about 2 seconds which is fine for an internal tool.

about 170 videos indexed. the engineering team uses it before design reviews to check whether someone already gave a talk on the approach they're considering. someone from product started using it to find specific things our CTO said in recorded all-hands which i didn't anticipate.

the codebase is a TranscriptRepo trait, a SearchService class, the http4s routes, the decline cli, and a Main object. maybe 350 lines total. no cats-effect streaming or fs2 for this one, just straightforward IO with doobie transactions.

reddit.com
u/scheemunai_ — 7 days ago
▲ 26 r/elixir

wrote a phoenix liveview app that searches across youtube video transcripts and the real time search feels absurdly good

i work at a mid size marketing agency and we have about 200 youtube videos. client case study recordings, internal strategy sessions, conference talks from our founders, onboarding walkthroughs for new hires. all unlisted and shared through notion. nobody can find anything because the only way to search is by video title which is usually something useless like "Q3 strategy call sept 14."

i've been looking for an excuse to build something real in phoenix liveview so i used this.

the app is a single liveview page. search box at the top, results below. as you type, results update live through the socket. each result shows the video title, date, speaker, and a snippet of the transcript around the matching text with the match highlighted. click the result and it opens the youtube video.

the backend is postgres with full text search. tsvector on the transcript column, GIN index, ts_headline for the snippet extraction. the liveview handles the search with a debounce on the phx-change event so it's not hammering postgres on every keystroke. i set it to 250ms which feels right. fast enough that it seems instant but not so aggressive that it fires on every character.

for pulling the actual transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

i wrote a mix task for ingestion. give it a youtube url and it pulls the transcript, parses it, and inserts it into the database. added a --file flag so i could point it at a text file with all 200 urls and let it run through them. the whole ingestion took maybe 3 minutes.

the thing that sold my coworkers on it was the liveview search. i demoed it in a meeting and people immediately started shouting out search terms to try. someone typed in a client name and found every video where that client was discussed. someone else searched for "attribution modeling" and found a conference talk from 2022 that nobody remembered existed.

the codebase is small. one liveview module, one context module with the search query, the mix task for ingestion, and two templates. maybe 300 lines of elixir total. deployed it on fly.io on the free tier since it's just internal and the traffic is light.

the part i keep coming back to is how well liveview fits this use case. server rendered search with live updates over websockets and zero javascript. the search box, the debounce, the result list, the highlighting, all just liveview. i would have needed react or vue for this in any other framework.

u/scheemunai_ — 9 days ago
▲ 1 r/Kotlin

built a ktor api that indexes youtube video transcripts and our team uses it more than any internal tool i've made

i work at a small software company and we have a bunch of youtube content. tutorials our devrel team makes, conference talks employees gave, recorded internal tech talks. the usual complaint was nobody could find anything. someone would remember a talk about coroutines from 6 months ago but not which video it was, and then waste time scrolling through our channel.

i built a small ktor service that pulls transcripts from youtube videos and makes them searchable. took me about a day.

the setup is pretty simple. a ktor server with three endpoints: one to submit a youtube url for processing, one to search across all stored transcripts, and one to list recent additions. the data goes into a postgres database. the transcript column has a GIN index with tsvector for full text search.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the kotlin side is just an HttpClient call to the api, kotlinx.serialization to parse the response, and an Exposed DAO insert into postgres. the submit endpoint kicks off a coroutine that processes the video in the background so the request returns immediately. the search endpoint runs a ts_query against the tsvector column and returns results ranked by ts_rank.

the whole service is maybe 200 lines of kotlin. the data classes, the three route handlers, the database table definition, and the background processing logic. i didn't bother with a frontend. the devrel team uses it through a simple html form and the rest of us just curl it.

about 190 videos indexed. the full text search is fast, under 20ms on every query i've tested. someone searches "dependency injection testing" and gets back every video where that came up with a snippet of the transcript around the match.

the thing that made me appreciate ktor for this was how little boilerplate there was. the coroutine scope for background processing was already there. the serialization plugin handled json without any config. the exposed dsl for postgres was clean. i've built similar tools in spring boot before and it would've been twice the code.

running on a single $5 digitalocean droplet with postgres on the same machine. handles our usage fine which is about 10-15 searches a day from maybe 8 people.

reddit.com
u/scheemunai_ — 11 days ago

i work on the content team at a B2B saas company and my manager wanted data on what competitors are talking about on youtube. not vibes, actual data. how often they publish, what topics they cover, which topics are getting crowded, where nobody is making content yet.

i couldn't find a tool that does this so i built a dataset myself and visualized it in tableau.

the data source is a google sheet with one row per youtube video. columns are video title, channel name, publish date, video length, and up to 3 topic tags i assign manually. the last column is the full transcript.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

paste the url, grab the transcript, drop it into the sheet. tag the topics based on what the video actually covers. about 2-3 minutes per video.

the tableau workbook connects to the google sheet and has four views. the first is a horizontal bar chart of topic frequency across all competitors. "AI features" has 42 videos across 6 competitors, "data migration" has 7. that tells me where the market attention is. the second view is a heat map with competitors on rows and topics on columns, colored by video count. you can see at a glance which competitors are heavy on which topics and where they're ignoring things.

the third view is a timeline. i plotted publish date on the x axis and colored by topic. when you see a cluster of the same color in the same month it means multiple competitors jumped on the same topic around the same time. that usually means something happened in the market. the fourth view is a simple table with a topic filter where you can read transcript snippets. my manager uses this one to quickly scan what competitors said about a specific topic without watching videos.

about 310 videos tracked across 7 competitor channels over 10 months. i add new ones every friday, takes about 30 minutes for the week's uploads.

the gap analysis is the part that actually influenced our content calendar. we found 4 topics where competitors had barely any content and we had zero. two of those became our most viewed videos last quarter because we were early.

the dashboard is published to tableau server so the whole content team can access it. my manager pulls it up in every monthly planning meeting now.

reddit.com
u/scheemunai_ — 15 days ago
▲ 17 r/csharp

i work at a mid-size company and our training department has hundreds of youtube videos. internal recordings, vendor demos, conference talks they send people to watch. nobody could ever find anything. someone would say "there was a video about setting up azure AD" and then spend 20 minutes scrolling through playlists.

i wrote a console app in c# that fixed this over a weekend.

the app takes a youtube url, pulls the full transcript, and inserts it into a sql server table with columns for video title, channel, date, category, and the transcript text. the transcript column has a full-text index on it.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the c# side is straightforward. HttpClient to call the api, JsonSerializer to deserialize the response, SqlCommand to insert. maybe 80 lines in the main method. i added a simple loop that reads urls from a csv file so we can batch import.

the search part is a separate razor pages app. one page with a text box that runs a CONTAINS query against the full-text index. results come back with the video title, date, category, and a snippet of the transcript around the matching terms. link to the youtube video on each result.

the full-text search is the part that actually matters. someone searches "azure AD conditional access" and gets back every video where someone said those words. not just videos i remembered to tag with "azure AD" but every mention across 400+ transcripts.

we're at about 420 videos now. the training team adds new ones through a simple form i tacked onto the razor pages app. the IT help desk started using it too which i didn't plan for. when someone opens a ticket about a process they've seen in a training video, the help desk searches the transcript database and sends them the exact video with a timestamp estimate.

the whole thing runs on an internal IIS server we already had. sql server express for the database. total cost was zero because we had the infrastructure sitting there.

reddit.com
u/scheemunai_ — 16 days ago
▲ 1 r/Upwork

i do content writing and repurposing on upwork. been on the platform about 2 years, mostly blog posts and social media copy. decent JSS, steady work, but always hustling for the next contract.

about 4 months ago a client asked if i could pull transcripts from their youtube videos and turn them into blog posts. they had about 50 videos on their channel and wanted written versions of each one. i said yes without really thinking about how i'd do the transcript part.

turns out pulling transcripts manually is miserable. youtube's built-in transcript feature gives you a wall of text with no punctuation or formatting. copying it out takes forever and you still have to clean it up.

i found transcript api which does it way faster:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

paste the url, get the full transcript back, clean and formatted. what used to take me 20-30 minutes per video now takes about 2 minutes.

i finished that first job way ahead of schedule. the client was happy and put me on a monthly retainer to do 8 videos a month. that's when i realized this could be its own service.

i added "youtube video to blog post" and "youtube transcript extraction and formatting" as services on my profile. didn't expect much but i've gotten 6 clients from it in 4 months. most of them are small business owners or course creators who have youtube content but want it in written form too. some just want clean transcripts. others want me to turn the transcript into a blog post or newsletter. either way the transcript is the starting point and it takes me almost no time to get it.

the margins on this work are way better than regular content writing. a blog post from scratch might take me 3-4 hours. turning a transcript into a blog post takes maybe 90 minutes because the content already exists, i'm just restructuring and editing it.

i'm averaging about $1,200/month from transcript-related work now. it's not my whole income but it's the most predictable part. two of the clients are on monthly retainers and the rest send batches every few weeks.

the funny part is nobody on upwork seems to be offering this as a specific service. there are thousands of "content writer" profiles but almost nobody positioning themselves around video-to-written content specifically. the niche is small but the competition is basically zero.

reddit.com
u/scheemunai_ — 18 days ago
▲ 61 r/aws

i work at a consulting firm and we do a lot of knowledge sharing through youtube. internal training videos, recorded client workshops, conference talks people found useful. the problem was the same as everyone's: nobody could find anything. someone would remember a video existed but not which one, and then waste 20 minutes scrubbing through playlists.

i built a serverless pipeline to fix it. the flow is: drop a youtube url into an SQS queue, a lambda picks it up, pulls the full transcript, and writes it to a dynamodb table with the video title, uploader, date, and tags. a second lambda powers a simple API gateway endpoint that does a search across the transcripts.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the lambda that processes videos is about 60 lines of python. it calls the api, parses the json response, and does a put_item to dynamo. the search lambda is even shorter. it does a scan with a filter expression on the transcript attribute. i know scan is bad practice on large tables but with 200 rows it returns in under 100ms so i'm not going to overcomplicate it.

the frontend is a static site in an S3 bucket behind cloudfront. one search box, one results list. when you search it shows matching videos with a snippet of the transcript around your search term. click through to the youtube video. that's it.

the whole thing costs almost nothing. lambda invocations are within the free tier. dynamodb is on-demand and 200 items with reads maybe 50 times a day is basically free. cloudfront and S3 are pennies. my last bill for this was $0.38.

about 200 videos in there now. the part i didn't expect is that the sales team started using it to find specific things clients said in recorded workshops. they search for a client's product name and find every time it was mentioned across all our training content. that wasn't the original purpose but it turned out to be the most useful part.

the only thing i'd do differently is use opensearch instead of dynamo scans if we get past maybe 500 videos. but i've been saying that for 3 months and we're still at 200 so it hasn't been worth the added complexity.

reddit.com
u/scheemunai_ — 20 days ago
▲ 1 r/Slack

our team shares youtube links in slack constantly. conference talks, competitor product demos, industry podcasts, tutorials. the problem is nobody watches them. someone drops a 45 minute video in a channel and it just sits there. everyone's busy, nobody has time to watch, and the knowledge dies in the thread.

i built a simple bot that fixed this. whenever someone posts a youtube link in specific channels, the bot grabs the full transcript, sends it to openai for a 3-paragraph summary, and posts the summary as a threaded reply under the original message. takes about 20 seconds.

now people actually engage with the content. they read the summary, sometimes they jump to the full video if the summary sounds relevant, sometimes they reply with questions or their own take. the channel went from a graveyard of unwatched links to actual discussions.

the bot is a small node app. slack event subscription listens for message events, regex matches youtube urls, pulls the transcript, hits openai, posts back via the slack api.

for the transcript part i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the openai prompt is simple — "summarize this video transcript in 3 short paragraphs. focus on key takeaways and any actionable insights. keep it under 200 words." that constraint matters because nobody reads a wall of text in slack either.

been running for about 6 weeks. the bot has summarized maybe 120 videos. the thing that surprised me most is how it changed behavior. people share more links now because they know the team will actually see the content. our CEO started using it to share investor interview videos which was not something i anticipated.

only issue is videos without captions obviously don't work, and the bot just silently skips those. also had to add a 10 minute cooldown per channel because one person dropped 8 links at once and the bot spammed the channel.

reddit.com
u/scheemunai_ — 21 days ago

our L&D team records all their training sessions and uploads them to youtube as unlisted videos. they also send people external youtube tutorials and conference talks as learning material. the problem was nobody could search any of it. you either remembered which video had the information or you rewatched a bunch of them hoping to find it.

i built a power automate flow that fixed this in an afternoon.

the trigger is a new row in a sharepoint list. someone on the training team pastes a youtube url into the list, the flow picks it up, pulls the full transcript, and saves it back into a multi-line text column on the same row. they also fill in a title and category column manually.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the http action calls the api with the youtube url and the parse json action grabs the transcript text from the response. two actions for the transcript part.

the useful bit is sharepoint search indexes everything including the multi-line text column. so now anyone in the company can go to the sharepoint site, search "how to configure SSO" and get back every training video where someone explained it. they click through to the sharepoint list item, read the transcript or open the youtube link. no more rewatching hour-long recordings to find a 3 minute explanation.

the training team added about 160 videos in the first month. they were excited enough that they went back and added all their old recordings too. the IT team found out about it and started adding vendor product demo videos. i didn't plan for that but it works the same way.

the whole flow is 5 actions. trigger, http request, parse json, condition to check for errors, update item. took me maybe 2 hours including testing. the sharepoint list took another 30 minutes to set up with the right columns and views.

the only issue i've hit is sharepoint's multi-line text column has a size limit. really long videos like 2+ hour recordings get truncated. for those i save the transcript as a txt file in a document library and link to it from the list item instead. not elegant but it works.

reddit.com
u/scheemunai_ — 22 days ago

i manage training content for a mid-size company and we have a ton of youtube videos. internal training recordings, conference talks from our industry, vendor product demos. the problem was nobody could ever find anything. someone would say "i remember there was a video about configuring SSO" and then spend 20 minutes scrolling through a youtube playlist trying to find it.

so i built an appsheet app for it. the backend is just a google sheet with columns for video title, channel, date, topic category, and a full transcript column.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

paste the url, grab the transcript, drop it into the sheet. two minutes per video.

the appsheet part is where it gets useful. i built a simple app with a search view that searches across all columns including the transcript text. so when someone needs to find the video where the VP explained our Q3 product roadmap they just type "Q3 roadmap" and the app finds it. they can see the transcript right in the app and jump to the video from there.

i also added a category filter so people can browse by topic. "onboarding," "security," "product updates," etc. and a recently added view sorted by date so the team can see what's new.

the whole thing took me maybe 3 hours to build in appsheet. the sheet does the heavy lifting for storage and appsheet just makes it searchable and browsable on any device. our field team uses it on their phones which is the part that actually got my manager excited.

about 220 videos in there now. it went from "my side project" to something the entire L&D team uses daily. my manager asked if i could add their webinar recordings too so now i'm expanding it.

the funniest part is people keep asking me how long it took to build this "custom app." three hours. on a saturday. with no code.

reddit.com
u/scheemunai_ — 23 days ago
▲ 10 r/grafana

this is a weird one but hear me out. i work in platform engineering and our team watches a lot of youtube content to stay current. kubecon talks, hashiconf recordings, stuff like that. i wanted a way to see what topics are getting talked about more or less over time without manually tracking anything.

so i built a pipeline that pulls transcripts from youtube tech channels, does some basic keyword extraction, and pushes the data into postgres. then i built a grafana dashboard on top of it.

for pulling transcripts i use transcript api. setup was:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the pipeline runs weekly via a cron job. it grabs new videos from a list of channels i follow, pulls the transcript, extracts keyword frequencies using a basic tf-idf approach, and writes the results to postgres with the video publish date.

the grafana side is where it gets interesting. i have a time series panel showing keyword frequency over time. you can literally watch "ebpf" go from barely mentioned in 2021 to every other talk in 2024. "service mesh" peaked around 2022 and has been declining. "platform engineering" started spiking mid 2023.

there's also a table panel that shows the most mentioned topics per channel so you can see what each conference or creator is focusing on. and a stat panel showing the total hours of content indexed which is at about 2000 hours now.

the dashboard started as a fun experiment but my team actually uses it now for planning. when we're deciding which technologies to evaluate we look at the trend lines. if something is trending up across multiple channels it's probably worth investigating. if it peaked 2 years ago and is declining we're more cautious.

the data source is just a postgres datasource in grafana. nothing custom. the queries are basic sql with time bucketing. the whole thing took maybe 2 days to build and grafana did the hard part of making it look presentable.

reddit.com
u/scheemunai_ — 25 days ago