r/graphql

▲ 32 r/graphql+27 crossposts

We ran a 1,655 person blind study on AI memory. The results changed how we think about the problem.

We’re building KAPEX (getkapex.ai), memoryware for AI applications. Two co-founders, bootstrapped, patent pending. I wanted to share some of what we’ve learned because the discourse in this space keeps circling the same assumptions and I think a few of them are wrong.

The study: 1,655 participants interacted with AI systems with and without our memory layer. Blind setup, they didn’t know which condition they were in.
The finding that mattered most: first-session preference was around 65%. Not bad, but not a clear signal. After 20+ sessions, preference climbed past 80% and kept rising. The longer people used it, the wider the gap.

That trajectory is the insight. Not the final number. The trajectory.

Here’s why that matters for anyone building in this space:

Most AI memory tools are optimized for first impressions. Demo well, retrieve fast, show the user you remembered their name. That’s fine. But it means the entire evaluation framework for memory (including the benchmarks everyone cites) is testing the wrong thing. LongMemEval and LoCoMo test whether you can find what was said. They don’t test whether the system knows what still matters.
Retrieval and relevance are different problems. The industry has spent two years building better retrieval. Almost nobody is building relevance governance: what stays important, what fades, what gets superseded, and whether the user can see and correct what the system believes.

Three things we learned the hard way:

1.	Clean store beats fancy retrieval. Every time. If your memory layer lets stale context accumulate without governance, no amount of reranking or hybrid search fixes the degradation over time. The capture and maintenance side is where the leverage actually is.

2.	Memory without transparency is a black box. If developers can’t see why the agent believes something, and users can’t see what the system thinks it knows about them, then memory becomes a liability rather than a feature. Inspectability isn’t a nice-to-have. It’s what makes correctability possible.

3.	The value of memory is invisible in short sessions. This is why benchmarks miss it. A 5-turn evaluation can’t distinguish between a system with real governance and one that just retrieved the right vector. The difference only shows up after sustained use, which is also when it matters most.  

Our approach treats relevance as something that should be handled continuously by the architecture, not at query time by the retrieval layer. Context that stops being reinforced through usage naturally loses priority. Not deleted, just deprioritized. That’s the principle. Can’t share more on implementation for IP reasons.

Curious what others here are seeing. Is anyone else finding that the retrieval-first paradigm breaks down over time? And is anyone working on evaluation frameworks that test sustained-use performance rather than single-session recall?

getkapex.ai if you want to follow along. Still pre-launch but opening access soon.

reddit.com
u/sandstone-oli — 3 hours ago
▲ 146 r/graphql+14 crossposts

Glia – Local-first shared memory layer (SQLite-vec + FTS5 + Offline Knowledge Graph)

Hey everyone,

I wanted to share a project I've been working on called Glia. It is a 100% offline, local-first RAG and memory layer designed to connect your AI web chats (Claude, ChatGPT, DeepSeek) with your local developer tools (Claude Code, Cursor, Windsurf) using a unified local database.

I wanted something lightweight that did not require pulling heavy Docker containers or subscribing to third-party memory APIs. I settled on a Node.js + SQLite architecture running sqlite-vec (for 768-dim float32 embeddings) alongside SQLite FTS5 for hybrid search, powered completely by local Ollama instances.

We just launched a live website that outlines the details and demonstrates the features in action:

Technical Stack & Features:

  • Hybrid Search Retrieval: SQLite-vec (using nomic-embed-text locally) + FTS5 keyword prefix matching (porter stemmer).
  • Surgical Sentence-level Trimming: Chunks are sliced into sentences. When a prompt is intercepted, only the exact matching sentences are pulled out of the vector store instead of the whole paragraph. It cuts LLM prompt bloat by ~90-95% in my benchmarks.
  • Knowledge Graph Extraction: An offline task queue uses a local LLM (llama3.1:8b via Ollama) to extract entity triples (subject-relation-object). These are stored in a SQLite facts table (or Neo4j if you run the full Docker compose profile) and fused with the vector retrieval score.
  • HyDE (Hypothetical Document Embeddings): Queries are pre-processed to generate a hypothetical answer, which is embedded together with the original query to bridge semantic gaps.
  • Concurrency: Running SQLite in WAL (Write-Ahead Logging) mode allows the browser extension dashboard and active MCP sessions to read/write concurrently without locking.
  • PII Redaction: Aggressive scrubbing of JWTs, API keys, emails, and IPs in the extension before data is saved.

The extension works on Claude.ai, ChatGPT, DeepSeek, Gemini, Grok, and Mistral. The MCP server runs out of the same backend database for your terminal agent or Cursor.

You can set it up with a single command: npx glia-ai-setup

Glia is completely open-source (MIT). If you like the local-first approach or want to contribute to the SQLite vector pipeline, PRs are very welcome, and a star on GitHub helps the project get discovered!

I would appreciate any feedback on the SQLite hybrid search scaling, the scoring fusion algorithm (RAG pipeline details are in RAG_PIPELINE.md), or local graph extraction performance!

u/Better-Platypus-3420 — 3 days ago
▲ 19 r/graphql

Viaduct 1.0: Airbnb’s open-source GraphQL framework

Airbnb released Viaduct 1.0 today.

Viaduct is Airbnb’s open-source GraphQL framework, built around a shared multi-tenant runtime where teams can contribute domain-owned modules to one central schema.

The part I think is especially interesting for GraphQL teams is the difference in distribution model:

Federation distributes development by distributing servers.

Viaduct distributes development by distributing modules.

The post also explains how Viaduct can still participate as a subgraph in a federated architecture, so it is not framed as “federation vs Viaduct” in a simplistic way.

Official post:

https://medium.com/airbnb-engineering/viaduct-1-0-and-the-future-of-airbnbs-data-mesh-6bab4ec98b89

GitHub:

https://github.com/airbnb/viaduct

Curious how people here think about the module-based approach versus the more common subgraph-server model.

u/jeanleonino — 8 days ago

Query/Mutation generation using AI

Hi, I am trying to figure out a way to create accurate queries/mutation on the fly using AI (OpenAI models).

My goal is to be able to generate these queries and mutations at runtime based on user prompts without loading the whole schema into the context.

I have tried using codegen to validate queries/mutation, provide example but the hard part is building context. I was thinking perhaps use of the descriptions in schema itself to create a vector DB of sort but I am kinda stuck.

Any thoughts? Thanks!

reddit.com
u/S123Peel — 12 days ago