u/straightedge23

built a multi-tenant property analysis app with supabase RLS and edge functions and the whole backend is like 200 lines

a friend who invests in rentals asked me to build him a tool where he pastes an address and sees if the numbers work. zestimate vs asking price, rent estimate, cash flow at different down payments, price history chart. he showed it to two other investors and they wanted their own accounts. so now i needed multi-tenancy.

supabase made the multi-tenant part almost trivial. the schema is simple:

sql

create table properties (
  id uuid default gen_random_uuid() primary key,
  user_id uuid references auth.users(id),
  address text,
  data jsonb,
  deal_score text,
  created_at timestamptz default now()
);

alter table properties enable row level security;
create policy "users see own properties"
  on properties for all
  using (auth.uid() = user_id);

that's it for multi-tenancy. each investor only sees their own saved properties. no middleware, no tenant filtering in every query, no "where tenant_id = X" sprinkled everywhere. RLS handles it at the database level. i've built multi-tenant apps before in django and rails and it was always way more code than this.

for pulling property data i wrote a supabase edge function that takes an address, calls a rest api called zillapi that returns zillow data as json, calculates a deal score, and returns the result. the edge function is about 40 lines of typescript. the score is based on rent-to-price ratio, zestimate gap, and price trend direction. green/yellow/red.

the jsonb column for storing the full api response was a good call. the api returns 300+ fields per property and i only display about 15. but when one of the investors asked me to add tax assessed value to the dashboard i didn't need a new api call or a migration. the data was already sitting in the jsonb. just updated the frontend to read one more field.

i also set up a scheduled edge function that runs every sunday. it loops through all saved properties across all users and refreshes the data from the api. if any zestimate changed more than 5% it flags it. one investor caught a $30k valuation drop on a property he was watching because the refresh flagged it on a monday morning.

for the ai side i also set up a skill so the investors can ask claude about their properties:

npx clawhub@latest install zillow-full

three investors using it now. the whole backend is supabase auth, one table with RLS, two edge functions (lookup + scheduled refresh), and a next.js frontend. no separate server, no celery, no redis. total cost is $0 on the supabase free tier.

reddit.com
u/straightedge23 — 14 hours ago
▲ 12 r/django

management commands + celery beat turned my weekend project into something 3 investors use daily

a friend who buys rental properties asked me to build him a simple dashboard where he pastes an address and sees if the numbers work. zestimate vs asking price, estimated rent, cash flow at different down payments, school ratings. he was doing this manually on zillow 10-15 times every morning.

the django side came together fast. a Property model with the address, the cached api data as a jsonfield, and a last_updated timestamp. a single view that takes an address, hits a rest api called zillapi that returns zillow data as json, saves it to the model, and returns the summary. basic template with the numbers laid out. nothing fancy.

the part that made this actually useful was management commands. i wrote one called refresh_properties that loops through every saved property and pulls fresh data from the api. hooked it up to celery beat running every sunday night. so every monday morning when my friend opens the dashboard, all his saved properties have current zestimates and rent estimates without him doing anything.

i also wrote a find_deals command that searches active listings in his target zip codes and flags anything where the asking price is more than 10% below the zestimate. that one runs daily. if it finds something it sends him a slack message with the address and the numbers. he's gotten 3 actual leads from this that he wouldn't have found manually.

the jsonfield for caching the api response was the right call. the api returns 300+ fields per property and i only display about 15 of them. but when my friend asks me to add something new to the dashboard i don't have to make another api call. the data is already sitting in the jsonfield. last week he wanted tax assessed value added. took me 5 minutes because the field was already cached.

for the ai side i set up a skill so he can ask claude about his saved properties:

npx clawhub@latest install zillow-full

two more investors from his meetup group asked for access. i added a simple user model with a foreign key to their saved properties and basic login. the multi-tenant part took maybe an afternoon.

total infra is a $7/month render instance running django + celery + redis + postgres. handles 3 users doing 30-40 lookups a day without breaking a sweat.

reddit.com
u/straightedge23 — 1 day ago
▲ 61 r/vim+2 crossposts

wrote a bash + vim workflow that pulls youtube video transcripts into buffers and searches them with fzf

i work at a devops consultancy and we have about 180 youtube videos. recorded architecture reviews, internal tech talks, client postmortems, vendor integration demos, conference presentations people bookmarked. they're all shared in a markdown file in our wiki which is basically a list of youtube links with dates. useless for finding anything unless you remember exactly when the video was recorded.

i wanted a way to search these videos by what was actually said in them without leaving the terminal. so i built a workflow around vim, bash, and fzf.

the first piece is a bash script that takes a youtube url and pulls the full transcript using transcript api. it saves the transcript as a plain text file named after the video title with the date prepended. one file per video. all the transcripts live in a directory called ~/transcripts.

the second piece is an fzf wrapper script. it runs fzf with --preview against the transcript directory. as you type, fzf fuzzy matches across all transcript files and the preview window shows the matching file with the match highlighted. select a result and it opens the transcript in vim with the cursor on the first match. i bound this to a key in my shell so i can hit ctrl-t and start searching immediately.

the vim side is where it gets useful. each transcript file has a yaml front matter block at the top with the video title, date, speaker, tags, and the youtube url. i wrote a small vim function that reads the url from the front matter and opens it in the browser. so the workflow is: fzf to find the video, vim to read the transcript, one keypress to open the actual youtube video if i need to watch it.

the ingestion script is about 30 lines of bash. curl to call the api, jq to parse the json and extract the transcript text, a few lines to generate the yaml front matter, and tee to write the file. i have a text file with all 180 urls and a for loop that processes them. the whole batch ran in about 4 minutes.

the fzf wrapper is maybe 15 lines. the vim function is 8 lines. the keybinding is one line in my vimrc.

about 180 videos indexed as plain text files. the consultants on my team use it before client calls to search for whether we've discussed a similar architecture before. i use it almost daily to find specific things from recorded tech talks. the nice thing about plain text files is that grep works, ripgrep works, fzf works, vim's built in search works. no special tooling needed beyond what's already on my machine.

the whole thing took an afternoon and i haven't changed anything since.

u/straightedge23 — 1 day ago
▲ 6 r/AZURE

built an internal video search tool on azure functions + cosmos db + ai search and the monthly cost is under $2

i work at a mid size insurance company and we have about 220 youtube videos. compliance training recordings, product update walkthroughs, quarterly all-hands, vendor integration demos, HR onboarding sessions. all sitting in a shared youtube channel that people have bookmarked but never actually browse because the titles are useless.

the compliance team was the trigger. they needed a way to verify whether a specific topic was covered in a training video without watching the entire thing. they were literally assigning people to watch 45 minute recordings and take notes. that felt like something a computer should be doing.

i built a search pipeline on azure.

first piece is an azure function (python, http trigger) that takes a youtube url, pulls the full transcript, and writes it to cosmos db. each document has the video title, date, department, tags, youtube link, and the full transcript text. second function runs on a cosmos db change feed trigger and pushes the transcript into an azure ai search index. the index is configured with full text search on the transcript field.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the python function calls the api with requests, parses the json, and writes to cosmos. maybe 35 lines for the ingestion function.

azure ai search handles the query side. i set up a simple index with searchable fields for transcript and title, and filterable fields for department and date. the search is good out of the box. it does BM25 ranking, hit highlighting on the transcript field, and supports simple query syntax so people can use quotes for phrases. didn't have to configure any of that, it's just what ai search does by default.

the frontend is a static web app on azure. one html page with a search box that calls the ai search REST api directly with an api key scoped to query-only permissions. no backend needed for the search itself. the function app is only for ingestion.

the cost breakdown is what surprised me. cosmos db on serverless tier for 220 documents with occasional reads costs basically nothing. the free tier of ai search gives you 50MB of index storage and 10,000 documents which is way more than i need. the function app is on consumption plan so it only bills when the ingestion runs. azure static web apps has a free tier. total monthly cost last month was $1.47 and most of that was the cosmos db request units.

about 220 videos indexed. the compliance team uses it to verify training coverage. HR uses it during onboarding to find specific videos for new hires. the engineering team uses it to search for recorded architecture discussions. the part i didn't expect was people searching for specific things their CEO said in all-hands recordings. apparently that's useful for writing internal proposals.

reddit.com
u/straightedge23 — 2 days ago

built a grafana dashboard that tracks zestimate drift across my rental portfolio and it caught a $40k valuation drop i would have missed

i own 6 rental units and i wanted a way to passively monitor what's happening with property values without manually checking zillow every week. so i built a pipeline that pulls zestimates and rent estimates on a schedule and pushes the data into postgres. then i put grafana on top of it.

for pulling property data i use a rest api called zillapi. it returns zillow data as json. zestimate, rent estimate, price history, tax assessed value, everything. i also set it up as an ai skill for when i want to ask questions about a property in natural language:

npx clawhub@latest install zillow-full

the pipeline is a python script that runs weekly via cron. it hits the api for each of my 6 properties, grabs the current zestimate, rent estimate, and tax assessed value, and appends a row to postgres with the timestamp. been running it since january so i have about 5 months of weekly snapshots now.

the grafana dashboard has a few panels:

a time series showing zestimate for all 6 properties on one chart. you can see them moving relative to each other. 5 of mine have been slowly climbing. one started dropping in march and i didn't notice until i looked at the dashboard in april. it had gone from $385k down to $344k over about 6 weeks.

turned out there was a new development being built 2 blocks away that was pulling comps down in that zip code. i wouldn't have caught that from casually checking zillow because i wasn't checking that property often. the trend line made it obvious. i ended up refinancing that property before the appraisal dropped further. saved me from losing about $40k in equity position.

there's also a stat panel showing total portfolio value (sum of all zestimates), a gauge panel showing rent estimate vs what i'm actually charging on each unit, and a table showing the gap between zestimate and tax assessed value for each property.

the rent gauge is the one i check most. two of my units were rented $200/month below what zillow's rent estimate said. i've since raised one and i'm waiting for the lease renewal on the other. that's potentially $4,800/year i was leaving on the table.

the whole thing took about a day to build. the api calls are simple, postgres is just inserts, and grafana did the hard work of making it look good. data cost is basically nothing since i'm only pulling 6 properties once a week.

reddit.com
u/straightedge23 — 4 days ago
▲ 15 r/Clojure

built a video transcript search tool in clojure and the whole thing is about 120 lines

i work at a small data consultancy and we have around 160 youtube videos. recorded client workshops, internal tech talks, vendor demos, conference presentations people found useful. all shared through a notion page with links. the usual problem where nobody can find anything because the titles are things like "workshop recording feb 2024" and you'd have to open each video and scrub through it to figure out what was covered.

i built a search tool for it in clojure last weekend.

the backend is a ring server with reitit for routing. one GET endpoint for search, one for serving the html page. postgres for storage with full text search. the queries use honeysql to build the tsvector match and ts_headline calls. i have one namespace for the db queries, one for the handlers, and one for the system startup. the whole server is maybe 80 lines across those three files.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the ingestion side is a separate namespace with a -main that reads urls from a file and processes them sequentially. clj-http to call the api, cheshire for json parsing, next.jdbc to insert into postgres. each video gets a row with the title, date, speaker, tags, youtube url, and the full transcript. the ingestion namespace is about 40 lines.

the postgres full text search does the heavy lifting. tsvector on the transcript column with a GIN index. the honeysql for the search query ended up being surprisingly clean. something like:

(-> (select :title :date :speaker :youtube_url
            (call :ts_headline "english" :transcript (call :websearch_to_tsquery "english" ?query)))
    (from :videos)
    (where [:raw "transcript_tsv @@ websearch_to_tsquery('english', ?)" query])
    (order-by [(call :ts_rank :transcript_tsv (call :websearch_to_tsquery "english" ?query)) :desc]))

reads better than the raw SQL honestly.

the frontend is a single html page served from resources. plain html with a fetch call to the search endpoint. no clojurescript, no reagent, no build step. just a text input and a div that gets populated with results. the results show the video title, speaker, date, and a snippet of the transcript with the match highlighted.

i deploy it with an uberjar on a VPS we already had. java -jar and it's running. about 160 videos indexed. the consultants use it before client calls to look up whether we've covered a topic before. someone found a recorded workshop from 18 months ago that answered a question a client had asked that week.

the thing i like about this project is that it's small enough to hold the entire codebase in your head but useful enough that people actually open it daily. 120 lines of clojure, a postgres table, and a static html file.

reddit.com
u/straightedge23 — 4 days ago

built a serverless pipeline on cloud functions + bigquery that makes 300 youtube videos searchable by what was actually said in them

i work at a B2B saas company and we have about 300 youtube videos across multiple channels. product walkthroughs, customer webinars, sales training recordings, engineering demos, partner integration tutorials. the content team kept complaining that they couldn't find specific videos unless they remembered the exact title, and our sales team would spend 15 minutes before a call trying to find the demo where someone explained a particular feature.

i built a pipeline on GCP to fix it. the whole thing is three cloud functions, a bigquery dataset, and a cloud run frontend.

first cloud function takes a youtube url, pulls the full transcript, and writes it to a bigquery table with the video title, channel, date, speaker, and tags. second cloud function runs on a pub/sub trigger and does the text processing. it breaks the transcript into chunks, generates tsvector-style tokens, and updates a search-optimized table. third cloud function is an HTTP trigger that takes a search query and runs it against bigquery using SEARCH() on the transcript column.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the first cloud function calls this to get the raw transcript. the python function is maybe 40 lines. requests library to call the api, bigquery client library to insert the row. i trigger it manually right now with a url parameter but eventually i'll hook it up to a google sheet where the content team can paste urls.

the bigquery part is where it gets interesting. bigquery added SEARCH() and SEARCH_INDEX last year and it works surprisingly well for this. i created a search index on the transcript column and the queries come back in under 2 seconds even across 300 transcripts. not as fast as postgres FTS on a dedicated instance but for a serverless setup with zero infrastructure to manage it's good enough.

the frontend is a cloud run service. flask app with one search page. search box, results with video title, date, and a snippet of the transcript. the snippet extraction was the most annoying part because bigquery doesn't have ts_headline like postgres, so i wrote a python function that finds the match position and pulls 200 characters around it.

the cost is basically nothing. bigquery on-demand pricing for the queries is pennies. cloud functions free tier covers the ingestion easily. cloud run bills per request and we get maybe 50 searches a day internally. my last invoice for this whole setup was $0.12.

about 300 videos indexed. the content team uses it to find existing content before creating new videos on topics we already covered. sales uses it before calls. someone from customer success started using it to find the exact timestamp where a feature was explained so they can send customers a link to that specific part of a recording.

reddit.com
u/straightedge23 — 7 days ago
▲ 11 r/FastAPI

built a property analysis microservice in fastapi and dependency injection made the whole thing surprisingly clean

a friend who invests in rental properties kept asking me to look up data on houses he was considering. zestimate, price trend, rent estimate, school ratings. he'd text me an address and i'd go manually check zillow. after the 50th time i figured i'd just build him something.

the backend is fastapi. i'm pulling property data from a rest api called zillapi that returns zillow data as json. 300+ fields per property. the fastapi part is what i want to talk about because dependency injection made this project way cleaner than i expected.

i set up the api client as a dependency. a single function that initializes the http client with the bearer token and base url. every endpoint that needs property data just declares it as a parameter. no global state, no passing clients around manually, no import spaghetti.

my main endpoints:

GET /property/{address} → full property summary
GET /compare?addresses=addr1&addresses=addr2 → side by side comparison
GET /cashflow/{address}?down_payment=25 → rental investment analysis

the cashflow endpoint is the one my friend uses most. it takes the rent estimate and asking price from the api response, calculates the mortgage at current rates, and returns monthly cash flow at whatever down payment percentage you pass in. the whole endpoint is about 30 lines including the response model.

pydantic response models were the other win. the raw api response has 300+ fields but i only need about 20 for the frontend. i defined a PropertySummary model with just the fields i care about and fastapi handles the filtering automatically. the response is clean typed json that my react frontend can trust. no extra serialization code, no manual field picking.

i also added background tasks for the comparison endpoint. when you compare 3-4 properties it makes multiple api calls. instead of doing them sequentially i use asyncio.gather so they all fire at once. comparison of 4 properties takes about 2 seconds instead of 6-8.

for the ai feature i set up a skill so he can also ask claude about properties:

npx clawhub@latest install zillow-full

the whole thing runs on a $5/month vps. my friend has been using it every morning for about a month. he checks 10-15 properties before he starts his actual job.

reddit.com
u/straightedge23 — 7 days ago
▲ 2 r/nextjs

server components + parallel routes made my property lookup app way faster than i expected

been building a tool for a real estate investor friend where he pastes a zillow address and gets a full breakdown. asking price vs zestimate, price history, rent estimate, school ratings, nearby comps. he was doing this manually across 6 zillow tabs every morning and it was eating his whole first hour.

i shipped the first version with client components and useEffect fetching. it worked but the initial page load was blank while it hydrated, the search had a loading spinner, and every property detail page was a separate client-side fetch. typical spa feel.

rewrote it with app router and the difference is significant. the property detail page is a server component that fetches at request time. no loading skeleton, no client javascript for the initial render. the data is just there when the page loads. i'm pulling property data from zillapi which returns 300+ fields as typed json, so the server component makes the fetch, picks the fields i need, and renders them. the client never sees the raw api response.

the part that made the biggest difference was parallel routes. the property page has three sections that load independently: the core details (instant from cache), the price history chart (needs a separate api call), and the neighborhood comps (another call). with parallel routes each section streams in as it resolves. the user sees the core details immediately while the chart and comps load in parallel. it feels fast even though the total data is the same.

for the ai analysis feature i installed a skill so my friend can ask claude about properties directly:

npx clawhub@latest install zillow-full

that gives 9 tools. but the main app is just next.js hitting the api with server components.

caching was the other win. i set property data to revalidate every 24 hours since zestimates don't change more than once a day anyway. so the second time anyone looks up the same address it loads instantly from the next.js cache. my friend looks up the same 10-15 properties repeatedly as he's deciding, so this made a noticeable difference for him.

he's been using it daily for about 3 weeks. showed it to two other investors at his meetup and now they want accounts. thinking about turning this into an actual saas but for now it's just a vercel deploy that costs me $0 on the hobby plan.

reddit.com
u/straightedge23 — 8 days ago

wrote a quick internal tool on a weekend and now it's used more than half the stuff our platform team built last quarter

i'm a senior engineer at a series B company, about 120 people. we have a youtube channel with around 250 videos. product demos, customer onboarding recordings, sales enablement content, engineering all-hands recordings, a bunch of conference talks our devrel team did. all of it is on youtube, some public some unlisted, and it's a mess to find anything.

the problem came up because a new PM joined and spent her first week asking people "do we have a video that explains X" over and over in slack. nobody could answer quickly because the videos are titled things like "Product Deep Dive - March 2024" which tells you nothing about what's actually covered.

i wrote a tool over a weekend. python script that pulls the full transcript from each video, stores it in postgres with the title, date, speaker, and tags. small flask frontend with a search box. you search for any phrase and it returns every video where someone said those words, with a snippet of the transcript around the match and a link to the video.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

postgres full text search with tsvector handles the query side. GIN index on the transcript column. the search returns in under 50ms for 250 videos which is way more than fast enough.

the whole thing is maybe 400 lines of python. deployed it on an internal k8s cluster we already had, pointed an internal DNS record at it, posted the link in slack. that was it.

within a week the new PM had basically stopped asking slack questions about video content. then sales started using it to find specific product demo clips before calls. then customer success started searching for onboarding recordings to send to customers who missed their sessions. then someone from marketing found a conference talk where our CTO said something quotable and used it in a blog post.

i've spent zero time maintaining it. the only ongoing work is running the ingestion script when someone tells me there's a new video, which takes about 15 seconds.

the thing i keep thinking about is how our platform team spent most of Q1 building an internal developer portal with a design system and SSO integration and custom dashboards. it's polished and well-architected and almost nobody uses it daily. my janky flask app with no auth and a single text input gets used by 5 departments every day.

i'm not saying their work doesn't matter. the portal will matter long term. but there's something about small tools that solve one specific annoying problem that makes them stick in a way that big initiatives don't. every senior dev i know has a story like this and i'm curious whether other people have noticed the same pattern.

reddit.com
u/straightedge23 — 8 days ago

built a searchable index of all our vendor training videos and it cut down "how do i configure X" tickets by a lot

we have maybe 180 vendor training videos spread across youtube. fortinet, veeam, meraki, o365 admin walkthroughs, vmware stuff from before the broadcom mess. some recorded by vendors during onboarding, some are conference sessions our team found useful, some are internal recordings of vendor calls we screen captured. all sitting in a shared youtube playlist that nobody ever scrolls through.

the problem is every time someone on the team needs to configure something they haven't touched in a while, they either ask me, open a ticket, or spend 20 minutes scrubbing through a 45 minute vendor video trying to find the 2 minutes where the guy talks about the specific setting. or they just wing it.

i spent a saturday writing a python script to fix this. the script pulls the full transcript from each video and dumps it into a postgres table with the video title, vendor, topic tags, and the youtube link. then i put a small flask app on top of it with one search box. you type "site to site vpn phase 2" and it gives you every video where someone said those words, with the snippet highlighted and a link to the video.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the python side is just requests to call the api and psycopg2 for postgres. maybe 50 lines for the ingestion script. i added a batch mode that reads urls from a text file because i wasn't going to paste 180 links one at a time.

postgres has full text search built in which is what makes it actually work. tsvector on the transcript column with a GIN index. searches come back in milliseconds. the flask app is one file, maybe 70 lines. i deployed it on an ubuntu VM we already had running nagios and threw nginx in front of it.

the part that surprised me was how much the junior admins use it. they search for specific error messages and find the vendor video where someone walks through that exact scenario. one of them told me he found a video about configuring DFS replication that he didn't even know we had. it's also useful before vendor calls because you can search for what was said in previous calls and not ask the same questions twice.

about 180 videos indexed. the whole thing runs on the same VM as our monitoring stack. took a saturday afternoon.

reddit.com
u/straightedge23 — 9 days ago
▲ 0 r/node

replaced a 600 line puppeteer zillow scraper with a single fetch call

inherited a node project at work that scrapes zillow for property data. previous dev built it about a year ago and it's been my problem for the last 4 months. puppeteer with stealth plugin, rotating residential proxies, a custom parser for each property type, retry logic, a queue system in bull, the works. 600+ lines just for the scraping layer.

it broke constantly. like every 2-3 weeks something on zillow's end would change and the selectors would stop working. or our proxy provider would rotate to a bad ip range and we'd get captcha walls for a day. i was spending more time maintaining this scraper than building actual features for our product.

last month i finally killed it and switched to zillapi. it's a rest api that returns zillow property data as json. you send it a zillow url or an address, you get back zestimate, price history, photos, schools, taxes, everything. typed json, same shape every time regardless of property type.

my scraping layer went from 600 lines to about 40. just a fetch call with a bearer token, some light error handling, and a transform function that maps their field names to our internal schema. that's it. no puppeteer, no proxies, no bull queue, no selector maintenance.

the part that still annoys me is i could have done this months ago. i kept thinking "i'll fix the scraper one more time" instead of just accepting that maintaining a zillow scraper is a losing game. every fix lasted 2-3 weeks and then something else would break.

performance is better too. the api responds in 1-2 seconds. our puppeteer setup was taking 8-15 seconds per page because it had to fully render the js and wait for the zestimate widget to load. our job queue used to back up during peak hours. that doesn't happen anymore.

we uninstalled puppeteer, puppeteer-extra, puppeteer-extra-plugin-stealth, and the proxy library. four fewer dependencies. node_modules lost about 200MB. our docker image shrunk from 1.2GB to 340MB because we don't need chromium anymore.

i mass deleted the scraper directory, the proxy config, and the selector test files. felt incredible.

anyone else here still maintaining a scraper they should probably just replace?

reddit.com
u/straightedge23 — 9 days ago
▲ 15 r/ruby

wrote a ruby script to index youtube video transcripts into sqlite and it's become our most used internal tool

i work at a small consultancy and we record a lot of internal stuff on youtube. client workshop recordings, internal tech talks, vendor product demos, conference talks people found useful. all unlisted, shared through slack. the problem was the same one everyone has: 200+ videos and nobody can find anything.

i wrote a ruby script to fix it one friday afternoon.

the script takes a youtube url, pulls the full transcript, and inserts it into a sqlite database along with the video title, date, tags, and the youtube link. i wrote a small sinatra app on top of it for search. one page, one text box, results come back with the video title, date, and a snippet of the transcript around the match.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the ruby side is net/http to call the api, json.parse for the response, and the sqlite3 gem for the database. the insert script is about 40 lines. i added a batch mode that reads urls from a text file so we could backfill the existing library.

sqlite has FTS5 for full text search which is what makes this actually useful. the search runs a MATCH query on the transcript column and comes back in a few milliseconds even with 200+ rows. someone searches "kubernetes pod networking" and gets every video where someone said those words.

the sinatra app is one file. maybe 60 lines including the erb template inline. i used shotgun for development and deployed it with puma behind nginx on a small VPS we already had running other internal stuff.

about 230 videos indexed now. the consultants use it before client calls to look up whether we've covered a topic in a previous workshop. the engineering team uses it to find internal tech talks. one of the partners started using it to find specific things he said in recorded presentations which i thought was funny.

the part i like about this project is how little ruby you need. no framework, no ORM, no background job system. just a script and a sinatra app. the whole thing is two files and a gemfile with three gems.

reddit.com
u/straightedge23 — 9 days ago

started pulling transcripts from our training videos before building courses and it cut my development time in half

i'm an instructional designer at a mid-size tech company. we have about 300 youtube videos from the last 3 years. internal training recordings, SME walkthroughs, onboarding sessions, product demos. most of them are unlisted and shared through our LMS.

the problem i kept running into was that every time i needed to build a new course or update an existing one, i had to rewatch hours of video to find the content i needed. an SME did a 45 minute walkthrough of a feature 8 months ago and now i need to turn that into a 10 minute module. so i'd sit there scrubbing through the video, taking notes, pausing every 30 seconds to type something.

i started pulling full transcripts from the videos instead. now when i need to build a course on a topic i search the transcripts first to find every video where someone explained it. i read through the relevant parts, pull out the key points, and have my outline done in 20 minutes instead of spending 3 hours watching and rewatching recordings.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

i've transcribed about 180 of our 300 videos so far. i keep them in a sharepoint folder organized by topic. each transcript is a text file named with the video title and date.

the biggest time saver is when i need to find specific things an SME said. my manager will ask "did anyone cover the new compliance process in a training video" and instead of watching 10 videos hoping to find it i just search the transcript folder for "compliance" and get every mention in seconds.

the other thing this fixed was accessibility. we were supposed to have captions on all training videos for ADA compliance but we were way behind. having the full transcript made it easy to upload captions to the videos. we went from maybe 30% captioned to 90% in about two weeks.

i also started using the transcripts as first drafts for written job aids. an SME explains a process verbally in a video and i clean up the transcript into a step-by-step document. way faster than writing from scratch or scheduling another meeting with the SME to walk me through it again.

the ID workflow used to be: watch video, take notes, outline, draft. now it's: search transcripts, read relevant sections, outline, draft. the first two steps went from hours to minutes.

reddit.com
u/straightedge23 — 11 days ago

Today I bought ghis

For weeks I was looking for some hot wheels that just make me buy them and today I got this Fast and Furious pack

u/straightedge23 — 12 days ago
▲ 138 r/NewTubers

started turning my youtube videos into blog posts using the transcripts and my search traffic doubled in 3 months

i have a small tech channel. about 3.2k subs, been posting for about a year. growth was slow but steady. most of my traffic came from youtube search and suggested videos. basically zero traffic from google.

a few months ago i read a post somewhere about how google indexes blog content way better than youtube videos. so if you have a video about "how to set up nginx reverse proxy" and someone googles that exact phrase, your blog post will show up on page one but your youtube video probably won't unless they specifically search youtube.

that made me think. i already have the content. it's in my videos. i just need it in written form too.

so i started pulling transcripts from my own videos and turning them into blog posts. not just pasting the transcript as-is because that reads terribly. i use the transcript as a starting point, clean it up, add headings, fix the parts where i rambled, and add screenshots where i was showing something on screen.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

each blog post takes me about 45 minutes. maybe 5 minutes to pull the transcript and 40 minutes to edit it into something readable. way faster than writing a blog post from scratch which used to take me 3-4 hours.

i've converted about 25 of my videos into blog posts so far. here's what happened:

my blog gets about 1,800 visits a month from google now. before this it was getting maybe 400. most of that traffic goes to the blog post, reads the written version, and some percentage clicks through to watch the video. my youtube views from external sources went up about 35%.

the other thing i didn't expect is that having a blog post for each video helps with youtube SEO too. i link the video in the blog post and the blog post in the video description. a few of the blog posts got backlinks from other sites which i think helped the youtube videos rank better too.

the posts that do best on google are the how-to and tutorial ones. my opinion videos and vlogs don't get any search traffic as blog posts which makes sense.

i'm not doing anything complicated. wordpress site on cheap hosting. the transcript gives me 80% of the blog post and i just clean up the other 20%. the hardest part is adding screenshots but even that's just taking a few screenshots from the video and dropping them in.

if you're making tutorial or educational content and you're not repurposing into blog posts you're leaving search traffic on the table. you already did the work. the content exists. it just needs to be in a format google can read.

reddit.com
u/straightedge23 — 14 days ago

i manage developer relations at a small company and we produce a lot of youtube content. tutorials, livestreams, conference talks, product demos. we had about 250 videos on the channel and the problem was always the same. someone on the team would need to find where we explained a specific feature or answered a specific question and there was no way to search for it besides scrolling through video titles and guessing.

i built a simple node app that stores full video transcripts in mongodb and uses text indexes to make them searchable. took an evening.

each document in the collection looks like: title, channel, publishDate, tags array, youtubeUrl, and a transcript field with the full text. i created a text index on title and transcript with weights so title matches rank higher.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

the insert script is maybe 30 lines. call the api with the youtube url, get the transcript back, build the document, insertOne. i added a bulk mode that reads from a json file of urls for the initial backfill.

the search is a $text query with $meta textScore for sorting. an express endpoint takes a query string, runs the text search, and returns results sorted by relevance. the response includes the video title, date, score, and a truncated chunk of the transcript around the first occurrence of the search terms. frontend is a single html page with a search box.

what surprised me is how good mongodb text search is for this use case. i assumed i'd need elasticsearch for anything involving searching through long documents. but with 250 documents and a text index on the transcript field, queries come back in under 50ms. searching "authentication webhook setup" returns every video where someone explained that topic, ranked by relevance.

the team uses it constantly now. the support team searches for answers before responding to customer questions. the marketing team finds old videos to reference in blog posts. the devrel team uses it to avoid repeating content we've already covered.

about 280 videos indexed now. the collection is maybe 40mb total. running on a free atlas tier which handles our read volume without any issues. maybe 50-60 searches a day across 8 people.

the only limitation i've hit is that mongodb text search doesn't support phrase proximity. if someone searches "rate limiting configuration" it finds documents with all three words but they might be in different paragraphs. for our use case that's fine because the transcript is usually about one topic so the words are close together anyway. but if i needed more precise matching i'd probably add atlas search with lucene analyzers.

reddit.com
u/straightedge23 — 15 days ago

i work in content marketing for a B2B company and part of my job is tracking what competitors are publishing on youtube. not just video titles but what they're actually covering in the videos. my manager kept asking things like "what topics are competitors talking about this quarter" and "are we covering the same things or are there gaps." i was giving vague answers because i didn't have the data to back anything up.

so i built a dataset and a power bi dashboard around it.

the data source is youtube video transcripts from competitor channels. i track 8 channels in our space. for each video i pull the full transcript, video title, channel name, publish date, and i manually tag 1-3 topic categories per video.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

i store everything in a sharepoint list. each row is one video with columns for title, channel, date, topics, and the transcript text. power bi connects to the list and refreshes daily.

the dashboard has a few pages. the first page is a bar chart showing topic frequency across all competitors. so i can see that "AI integration" appeared in 34 videos this quarter while "data migration" only showed up in 6. the second page is a matrix view broken down by competitor and topic so i can see which channels are heavy on which subjects. the third page is a timeline showing when topics spike. when 4 competitors all start talking about the same thing in the same month there's usually something happening in the market.

the part my manager actually uses the most is a simple table with a slicer for topic and competitor. she filters to a specific competitor and topic and reads through the transcript snippets. she stopped asking me to summarize competitor videos because she can just look it up herself.

about 240 videos in the dataset now going back about 8 months. adding new ones takes maybe 15 minutes a week. pull transcripts, paste into sharepoint, tag topics.

the dashboard itself is pretty basic power bi. nothing fancy with DAX. the bar charts and matrix are standard visuals. the topic slicer is just a regular slicer on the category column. the hardest part was getting the data, not building the report.

reddit.com
u/straightedge23 — 16 days ago
▲ 2 r/framer

Built my SaaS landing page on Framer about 8 months ago. SEO has been decent, we rank page 1 for a few keywords. But I recently tested what happens when you ask ChatGPT, Claude, Gemini, and Perplexity questions like "best project management tool for small teams" (our space). We don't get mentioned once. Two competitors on Webflow and one on WordPress show up in almost every answer. Is this a Framer thing or am I missing something else entirely?

EDIT:

a few people mentioned ai visibility audits so i tried one. used landkit its free, you just paste your url and it checks all four engines at once. scored 31 out of 100. the competitor on webflow that keeps showing up in chatgpt answers? 67. so it's definitely not a framer-specific problem, my content just isn't structured the way these AI engines want apparently

u/straightedge23 — 18 days ago
▲ 3 r/trello

i manage content for a B2B company and part of my job is tracking what competitors and industry people are saying in youtube videos. product announcements, conference talks, webinar recordings. the kind of stuff you can't just skim a blog post for.

i was keeping track of all this in a google doc that was basically a graveyard. video title, link, and my notes which were always something useless like "talked about pricing strategy and market positioning" which tells me nothing 3 weeks later when i actually need the details.

moved the whole system into trello. each card is one video. the title is the video title and channel name. i use labels for topic categories like "pricing," "product launch," "industry trends," "competitor." the description field has the full transcript of the video.

for pulling transcripts i use transcript api:

npx skills add ZeroPointRepo/youtube-skills --skill youtube-full

paste the url, grab the transcript, drop it in the card description. maybe 2 minutes per video including tagging.

the reason trello works better than my google doc is the search. trello searches inside card descriptions. so when my boss asks "what did [competitor] say about their enterprise pricing change" i search "enterprise pricing" and get back every card where someone mentioned it. not just the ones i remembered to tag correctly. the full transcript is in there so it catches everything.

i also set up a board view with lists by month so i can see what was added recently. and a separate list called "needs review" where i dump cards quickly when i don't have time to tag them properly. i get to them later.

about 130 videos tracked across 6 months. my team has view access so when someone needs to prep for a sales call against a specific competitor they filter by that label and read through the transcripts instead of watching hours of video. a couple people on the team started adding their own cards too which i didn't expect.

the whole setup took maybe 30 minutes. it's just a board with labels and a search habit. nothing complicated about it.

reddit.com
u/straightedge23 — 18 days ago