u/Sea_Lawfulness_5602

Struggling to find the perfect Search/Scraping API

Hey everyone,

I'm building an AI fact-checking pipeline to verify video claims.
The logic is solid, but the Web Search/Extraction layer is a nightmare. Here is our experience so far:

Tavily: Perfect high-tier sources, but way too expensive at scale.
Exa.ai: Fast, but their neural search pulls too many low-tier blogs/forums instead of authoritative news, even with strict prompting.
Jina API: Cheap and good markdown, but rate-limits instantly on parallel queries. Payloads are also chaotic (burns millions of tokens on massive PDFs, or returns zero content).

The Goal: We need an API that guarantees top-tier domains (Reuters, Gov, AP), extracts clean text/markdown, handles async concurrency, and doesn't break the bank.

Currently considering the Perplexity Search API or a DIY Brave Search + Firecrawl stack.

Has anyone built a high-volume RAG pipeline recently? What is the golden stack for Web Search right now?

Thanks

reddit.com

u/Sea_Lawfulness_5602 — 1 day ago

▲ 3 r/digital_ocean

How to unlock "Premium AMD" droplets on a Student Account?

I currently have a student account with free credits, but I can't create a "Basic - Premium AMD" droplet.

Will adding manual funds unlock this?
Do I need to upgrade to Tier 2?
Or are Premium AMD droplets completely restricted for student accounts?

Thanks

reddit.com

u/Sea_Lawfulness_5602 — 4 days ago

▲ 1 r/MachineLearning

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]

Hey everyone, I’m building a backend that analyzes long YouTube videos using an LLM.

Currently, my flow is a slow waterfall: Download full audio -> Whisper -> LLM -> Return results. For a 30-minute video, the user waits forever.

I want to pipeline this for real-time SSE streaming: [Chunk Audio on the fly] -> [Whisper] -> [LLM] -> [Stream to UI]

My questions for the data/backend engineers:

Chunking & VAD: What's the best way to chunk YouTube audio streams (e.g., via ffmpeg) without cutting sentences in half and ruining the LLM's context?
Queueing: Is standard asyncio in FastAPI enough to handle these overlapping tasks, or do I strictly need Celery/Redis workers for this pipeline?

Any library recommendations or architectural patterns would be hugely appreciated

reddit.com

u/Sea_Lawfulness_5602 — 4 days ago

▲ 18 r/jordan

ما مدى صحة هذا الخبر من عشرة؟

😂😂😂😂😂😂

u/Sea_Lawfulness_5602 — 5 days ago

▲ 0 r/jordan

ضروري جدااااا

بدي ستيكر "طز" بشكل عاجل لو سمحتوا

والافضل أنه يكون معبر وفيه من وراء مغزى وحكمة

ضروري حياة أو موت 😔

reddit.com

u/Sea_Lawfulness_5602 — 8 days ago

▲ 2 r/jordan

عقدة نفسية

انا امنيتي انجح مش عشان لا مصاري ولا سلطة ولا اشي
بس بدي انجح عشان ابين حياتي بتجنن واقهر شوية ناس
هل انا هيك عندي عقدة نفسية ولازم اتعالج؟
اسئلة اخر الليل

reddit.com

u/Sea_Lawfulness_5602 — 15 days ago

▲ 2 r/learnprogramming

I need to extract video metadata and transcripts from YouTube, Instagram Reels, and TikTok.

Most "all-in-one" APIs are brittle and fail when native captions are missing. Running yt-dlp locally leads to instant 403/IP bans on Meta platforms, and enterprise solutions like Apify are too costly for this stage.

I have a local Whisper pipeline ready but need a stable fetcher for raw media links (.mp4/.mp3) that handles IG/TikTok without blocks. I am looking for a pay-as-you-go solution stable for non-English content.

How are you handling multi-platform video data in 2026. Is there a "Goldilocks" API available, or is a custom proxy wrapper the only real way to survive.

reddit.com

u/Sea_Lawfulness_5602 — 15 days ago

▲ 3 r/SaaS

I need to extract video metadata and transcripts from YouTube, Instagram Reels, and TikTok.

How are you handling multi-platform video data in 2026. Is there a "Goldilocks" API available, or is a custom proxy wrapper the only real way to survive.

reddit.com

u/Sea_Lawfulness_5602 — 15 days ago

▲ 1 r/Backend

Hey everyone,

I'm currently building an AI-powered app (Flutter frontend + Python/FastAPI backend). Right now, the app successfully analyzes long-form YouTube videos by fetching their transcripts and running them through an LLM pipeline.

However, I want to expand the app to support short-form content (TikTok, Instagram Reels, YouTube Shorts) where captions aren't always reliably available via APIs.

The desired workflow:

User pastes a TikTok or IG Reel URL into the app.
The backend downloads/extracts only the audio (e.g., MP3/M4A) from that URL.
The backend runs the audio through a Speech-to-Text model (like Whisper) to get the transcript.
The transcript is fed into my existing LLM pipeline.

My questions for the community:

Extraction: I know IG and TikTok are notoriously aggressive against scraping. Is yt-dlp still the most reliable tool for extracting audio from these platforms in a production backend, or are there better alternatives/APIs?
Transcription: For the STT part, is it better (cost/speed-wise) to use OpenAI's Whisper API directly, or host a smaller Whisper model locally on my server (e.g., using faster-whisper) since these are just 15-60 second clips?
Infrastructure: Any tips on handling the temporary audio files? Should I process this entirely in memory (RAM), or save to /tmp and delete after transcription?

Any advice, libraries, or architectural tips would be greatly appreciated. Thanks!

reddit.com

u/Sea_Lawfulness_5602 — 27 days ago

Struggling to find the perfect Search/Scraping API

How to unlock "Premium AMD" droplets on a Student Account?

Architecture advice: Real-time pipeline for YouTube Audio -&gt; Whisper -&gt; LLM -&gt; SSE (Sub-10s latency) [D]

ما مدى صحة هذا الخبر من عشرة؟

ضروري جدااااا

عقدة نفسية

Architecture advice: Real-time pipeline for YouTube Audio -> Whisper -> LLM -> SSE (Sub-10s latency) [D]