u/JDubbsTheDev

▲ 2 r/Rag

If you've shipped RAG into production, you've probably hit some version of this: the retrieval is inconsistent across sessions, two queries that should return the same chunks return different ones, your team can't agree on chunk size, and the agent has no way to know whether the passage it just retrieved is well-supported or a one-off line from a single doc that contradicts three others. Reranking helps but doesn't fix the underlying problem, which is that the system has no structural understanding of what's in the corpus, only what's similar to the query.

I've watched people inside companies and in the open-source community attack this from a dozen angles: Team Knowledge Hubs, Local RAG, GraphRAG variants, Confluence retrieval bots, custom pipelines stitched on top of Llamaindex. Different attempts, same underlying need: a queryable artifact that understands the entities and relationships in the corpus, not just the text similarity. Something a local IDE, a Slack bot, or an agent can hit for real-time context without rebuilding a stale local index per tool, per team, per developer.

This isn't only an engineering problem. CS ops has years of support history. Legal has contract patterns. Implementation teams know customer quirks. SMEs hold things that never got written down. Each of those teams ends up reinventing some retrieval layer or pasting context into prompts manually. As a former Technical Advisor for some pretty complex financial products, there were many times I would just think "if only there was a shared knowledge layer I could tap into."

I'm not reinventing the wheel. Karpathy's LLM wiki was an early, well-known example, and projects like Microsoft's GraphRAG, LlamaIndex's PropertyGraph, LightRAG, and others have built variations since. What I'm trying to do is define an open standard for the artifact itself. One schema, one query interface. Any compliant tool can read any compliant graph, regardless of which implementation produced it.

The spec is called AKS (Agent Knowledge Standard). Apache 2.0, intentionally not tied to any product. A compiled graph is called a Knowledge Stack, and each stack is portable and shareable - True global domain context.

A few things worth knowing if you care about retrieval specifically:

The retrieval pattern is two-stage. The reference server's /context endpoint runs hybrid chunk retrieval first — geometric mean of vector similarity and trigram similarity, with a recency multiplier — to surface candidate text. Then one LLM call asks "given these chunks and this entity catalog, which compiled entities are relevant to the query?" The response returns the entity subgraph, not the chunks. Chunks are an intermediate signal, never the final answer. The agent gets compiled knowledge with typed relationships, not text passages it has to reason over.

The geometric mean is the part I'm most uncertain about. It penalizes results where one signal is weak much harder than an arithmetic mean would. A chunk scoring 0.9 vector but 0.1 trigram drops to 0.3 in the geometric mean instead of 0.5. In practice this seems to remove a lot of the semantically-adjacent-but-keyword-unrelated noise that pure vector search surfaces. But I've only tested it on a handful of corpora. I'd love to know what you're actually using and how it compares.

The spec takes provenance and trust seriously at the schema level. Every entity carries a confidence score, a list of contributing documents, a last_corroborated_at timestamp, and a scope (stack / workspace / domain). Every relationship carries the same. Every document has a content hash, a truncation flag, a source type. Every traversal response returns the path the graph walk actually took. None of these are LLM-judged. They're structural — counting source documents, comparing timestamps, checking hashes. An agent reading the response can grade its own confidence per fact instead of pretending all retrieved content is equally valid. This is the part I think most graph RAG projects underweight, and it's the part of the spec I most want feedback on.

The reference server is small and readable. FastAPI + Postgres + pgvector. The four endpoints the spec requires: ingest documents and compile them into a graph, return a relevant subgraph for a natural language query, walk the graph from a known entity, export the whole thing as a portable bundle. There's also an MCP wrapper so Claude Desktop can talk to it directly. The README walks through the architecture decisions explicitly so you can see why each tradeoff was made.

Spec: https://github.com/Agent-Knowledge-Standard/AKS-Specification
Reference server: https://github.com/Agent-Knowledge-Standard/AKS-Reference-Server

What I'd love feedback on:

  • The two-stage retrieval pattern (hybrid scoring → entity identification → subgraph return). Overengineered? Underengineered? What would you change?
  • The geometric mean scoring versus more conventional approaches (RRF, weighted sum, cross-encoder rerank). Has anyone benchmarked these against each other on real corpora?
  • The trust signals at the schema level — confidence, source count, last_corroborated, scope, traversal_path. Right shape? Missing something obvious? Are there signals you've wanted in your own RAG systems that aren't here?
  • Audit and quality scoring as a first-class feature is intentionally out of scope for v0. I want to ship the core graph and retrieval first, see what patterns actually emerge, then standardize audit in v1.

If anyone wants to spin up the reference server and break it, the README has a Docker compose setup. Genuinely appreciate adversarial users more than cheerleaders here.

u/JDubbsTheDev — 23 days ago

The problem is something I've watched people at work and in the community try to solve over and over in different ways: Team Knowledge Hubs, Local RAG for development environments, one-off retrieval pipelines bolted onto Confluence. Different teams, different attempts, same underlying need: an artifact that understands the history and connections across the ecosystem, so your local IDE or agent can query it for real-time context without every user having to maintain their own local index.

This is not just an engineering problem though. Every team in a company has knowledge their AI tools need. For example: CS ops has years of support history, a legal team has contract patterns and obligations, an implementation team knows every customer's quirks, and SMEs hold things that never got written down. Today, every one of those teams either pastes context into prompts, builds a one-off RAG index that goes stale, or just doesn't get to use AI well at all because their company only lets them use Gemini in a Google UI. Worse, when one person's Claude Code retrieves from those docs, the next person's Cursor retrieves differently. Same docs, different chunks, different answers. There's no shared picture across people, sessions, or tools. As a former Technical Advisor for some pretty complex financial products, there were many times I would just think "if only there was a shared knowledge layer I could tap into".

I'm not reinventing the wheel here. Karpathy's LLM wiki kicked off a wave of projects compiling domain knowledge into structured forms LLMs can use, and a bunch of teams have built variations since. What I'm trying to do is define a standard for it. One format, one query interface. Any compliant tool can read any compliant graph.

The structural fix that all of these projects (mine included) are converging on is: stop pretending each tool can maintain its own world view and instead compile one shared picture every tool reads from. Not a vector index, but a graph. Domains and entities the team works with, typed relationships between them, source attribution, confidence. Built once from the team's source material and queryable by any compliant tool.

I called the spec AKS (Agent Knowledge Standard). Its licensed with Apache 2.0, I'd like for it to be community governed, intentionally not tied to any product. A team's compiled graph is called a Knowledge Stack. SMEs can compile their own. Engineering can compile theirs. Anyone's agent can query any of them.

One thing I want to highlight because it's underrated in most RAG conversations: the spec takes provenance and trust seriously at the schema level. Every entity carries a confidence score, a list of contributing documents, a last_corroborated_at timestamp, and a scope (stack / workspace / domain). Every relationship carries the same. Every document carries a content hash, a truncation flag, a source type. Every traversal response returns the path the system actually walked. The signals are structural, not LLM-judged. An agent reading from a Stack can grade its own confidence per fact instead of pretending all retrieved text is equally valid.

The reference server is FastAPI + Postgres + pgvector. Implements the four things the spec requires: ingest documents and compile them into a graph, return a relevant subgraph for a natural language query, walk the graph from a known entity, and export the whole thing as a portable bundle. It has an MCP wrapper so Claude Desktop can talk to it directly.

Spec: https://github.com/Agent-Knowledge-Standard/AKS-Specification
Reference server: https://github.com/Agent-Knowledge-Standard/AKS-Reference-Server

What I'd love feedback on:

  • Does the problem actually match something you've hit, or am I solving a thing that doesn't really exist for most people?
  • The retrieval pattern is two-stage: hybrid chunk scoring to find candidate text, one LLM call to identify which compiled entities are relevant, then return the entity subgraph instead of the chunks. Is this overengineered or about right?
  • The trust signals on entities and relationships — confidence, source count, last corroborated, scope — are the right shape, or am I missing something obvious?
  • Audit and quality scoring as a first-class feature is intentionally out of scope for v0. Want to ship the core graph and retrieval first, then revisit audit once a few implementations exist and we can see what patterns matter.

If anyone wants to spin up the reference server and try it, the README has a Docker compose setup. Would genuinely appreciate someone breaking it.

u/JDubbsTheDev — 23 days ago