u/False_Specific_1255

Been talking to a few engineers at fintech and insurtech startups about building AI on internal documents — policy PDFs, RBI circulars, KYC files, claims data.

The demo always works. The production version is where things get weird.

What I keep hearing:

Retrieval breaks on scanned PDFs and anything with tables
The AI gives an answer but nobody can trace which document it came from
Permissions are an afterthought — retrieval doesn't know who's asking
If an auditor asks "what did the system do on this query," there's no answer

Most teams end up stitching together LangChain + Pinecone + custom logic, and it holds until it doesn't.

I'm exploring whether a service that lets you configure your documents, retrieval settings, and evals in a UI — and just hands you one API key — would actually solve something real. Or whether the answer is always "we'd rather own the stack."

Genuinely trying to understand the problem before building anything.

If you've shipped AI on documents in a regulated context, what broke? What did you end up doing about it?

How are Indian startups handling RAG over sensitive documents in production?