Is this solution dumb?
I'm a novice to intelligent systems integration, so any opinions would be appreciated.
I'm building a system which is designed to ingest a user-written article about a very particular domain, let's say it's the coffee industry. We have a vast repository of quantitative and qualitative (prose) data, and we want to query it for information that the user might find enhances their article.
We're structuring the quantitative data in an SQL db and the prose data within the RAG searchable AWS Knowledge Base.
I plan on mediating LLM -> Data communication via MCP which exposes endpoints for template queries. The parameters for each endpoint fill in the placeholders within the templates. A template query would be something like SQL:'revenue for <company> in <region> in <2025>'
My concern is that every time the data returned from the MCP is reproduced by an LLM we introduce hallucination risk. So how about this: every single Knowledge Base or SQL query launched by the MCP gets put into a Redis instance with a TTL of 30 mins.
This way we can have the LLM reason over the results, summarise them for output (and occasionally hallucinate) but the raw data remains immutable within Redis. The LLM's output can be summaries attached to IDs which we use to pull the raw data from Redis before finally giving it back to the user.