How are Indian startups handling RAG over sensitive documents in production?
Been talking to a few engineers at fintech and insurtech startups about building AI on internal documents — policy PDFs, RBI circulars, KYC files, claims data.
The demo always works. The production version is where things get weird.
What I keep hearing:
- Retrieval breaks on scanned PDFs and anything with tables
- The AI gives an answer but nobody can trace which document it came from
- Permissions are an afterthought — retrieval doesn't know who's asking
- If an auditor asks "what did the system do on this query," there's no answer
Most teams end up stitching together LangChain + Pinecone + custom logic, and it holds until it doesn't.
I'm exploring whether a service that lets you configure your documents, retrieval settings, and evals in a UI — and just hands you one API key — would actually solve something real. Or whether the answer is always "we'd rather own the stack."
Genuinely trying to understand the problem before building anything.
If you've shipped AI on documents in a regulated context, what broke? What did you end up doing about it?