Follow-up experiment: does it make sense to add a structured evidence boundary before RAG retrieval?
Title: Follow-up experiment: structured evidence boundaries before RAG retrieval
I posted an earlier discussion about adding a structured evidence boundary before RAG retrieval.
This is a small follow-up experiment around that idea.
Most RAG pipelines start with something like:
Which chunk is semantically closest to the query?
That works well for many use cases.
But I’m exploring whether, in higher-risk domains like legal, compliance, finance, or API specs, the system should first ask a different question:
Which evidence boundary is this query allowed to search inside?
For example, before retrieving text, maybe the system should first narrow the search space by jurisdiction, corpus, source identity, document version, legal unit, or some kind of deterministic address.
The goal is not to replace vector search or BM25.
It is more like adding a routing / evidence-selection layer before retrieval, so downstream search happens inside a smaller and more auditable space.
I put together a small implementation and an RQ7 benchmark pass here:
https://github.com/Void-Ghost000/HSRAG
Important scope limitation: this benchmark does not call an LLM.
It does not test:
generated answer quality
LLM reasoning
legal reasoning
end-to-end chatbot behavior
production vector database performance
It only looks at the retrieval / routing / evidence-selection layer.
So I’m not claiming this “solves RAG” or “makes the model smarter.”
The narrower experiment is about whether structured evidence boundaries can help reduce:
wrong-corpus retrieval
cross-jurisdiction contamination
unsupported queries being forced into a nearby chunk
unnecessary candidate search
weak auditability of retrieved evidence
The repo now has a clearer RQ7 section with key findings, evidence summary, local vector/hybrid baselines, and explicit claim boundaries.
I’d also be interested in hearing from people who want to explore or extend this kind of experiment.
Some directions I’m thinking about:
trying the same retrieval-boundary idea on other corpora or domains
adding better baselines or more adversarial query sets
visualizing the retrieval / routing / evidence-selection path as a hash-audited trace
using a hash audit chain to make retrieved evidence easier to inspect, replay, or challenge
exploring whether this kind of structure could help with memory, provenance, or long-running agent workflows
To be clear, I don’t mean exposing hidden LLM reasoning. I’m more interested in making the external retrieval process, evidence path, and system decisions easier to inspect.
If anyone has related ideas, criticism, implementation suggestions, or wants to try a different experiment design, I’d be happy to discuss.