Struggling to find the perfect Search/Scraping API
Hey everyone,
I'm building an AI fact-checking pipeline to verify video claims.
The logic is solid, but the Web Search/Extraction layer is a nightmare. Here is our experience so far:
- Tavily: Perfect high-tier sources, but way too expensive at scale.
- Exa.ai: Fast, but their neural search pulls too many low-tier blogs/forums instead of authoritative news, even with strict prompting.
- Jina API: Cheap and good markdown, but rate-limits instantly on parallel queries. Payloads are also chaotic (burns millions of tokens on massive PDFs, or returns zero content).
The Goal: We need an API that guarantees top-tier domains (Reuters, Gov, AP), extracts clean text/markdown, handles async concurrency, and doesn't break the bank.
Currently considering the Perplexity Search API or a DIY Brave Search + Firecrawl stack.
Has anyone built a high-volume RAG pipeline recently? What is the golden stack for Web Search right now?
Thanks