u/SearchFlashy9801 — reddlx

Quick context: I have been hitting Claude Code Max 5x limits in under 2 hours on real work. The session counter goes from 21% to 100% on a single complex prompt. If you have been on the recent threads, you know exactly what I mean.

So I built engramx. It is an MCP server plus a SQLite knowledge graph that intercepts file reads at the agent boundary. When Claude is about to read a file engram has indexed, the hook returns a structural summary instead of the raw content. Same edit, same diff, far fewer tokens consumed in the round trip.

The benchmark is committed to the repo. On a real 87-file codebase, the aggregate reduction is 89.1%. Best-case file dropped from 18,820 tokens to 306. The bench script is bench/real-world.ts, you can run it on any project you own.

v3.4 shipped Friday and all the install paths are live now. The same engram works across 8 IDEs natively. Claude Code (hooks plus the official plugin in review), Cursor (MDC plus MCP plus a VS Code extension on OpenVSX), Cline, Continue.dev, Aider, Windsurf, Zed, OpenAI Codex CLI. One install, one graph, every tool benefits.

It is local-first. SQLite database lives at .engram/graph.db in your repo. Nothing leaves your machine. Apache 2.0. No account, no telemetry.

npm install -g engramx
cd ~/your-project
engram setup

Cursor users can install the extension directly:

code --install-extension nickcirv.engram-vscode

Heads up on what comes next. v4.0 "Mesh + Spine" lands May 25. Adds an opt-in federation layer so engram instances on different machines exchange mistakes and ADRs without sharing source. Phase 1 foundation already merged this week (ed25519 identity, 14-category PII gate, 1007 tests). Subscribe via the GitHub Discussions page if you want updates.

There is also a engram cost command that tracks how many tokens it has saved you, per project per week. After 24 hours of normal use the digest shows real numbers.

Repo and benchmark: github.com/NickCirv/engram

Happy to answer questions. If you have hit the new rate limits and want a second pair of hands on it, comment your stack and I will help.

Six months ago I started a side project because Claude Code kept forgetting things I'd already explained. My architecture, the weird reason that one function exists, what broke last deploy. Every new session I'd burn 5-10k tokens just getting it back up to speed.

I tried the obvious stuff first — bigger CLAUDE.md, dumping README files into context. CLAUDE.md got bloated to the point Claude was reading 8k of stale notes before touching any actual code. Wasn't working.

So I built engramx. It's a local memory layer — SQLite file in your repo at `.engram/graph.db`, no cloud, no telemetry, no account. Builds a knowledge graph of your codebase via AST parsing, then a PreToolUse hook intercepts every Read/Edit/Write/Bash and slips in a small "rich packet" of relevant context before Claude sees the file.

Two things I'm proud of in v3.0:

It remembers your mistakes. When something breaks, engram writes a regret-buffer entry. Next session, when Claude touches that file, the past mistake surfaces at the top of context with a warning. v3.0 added an opt-in mistake-guard that can outright block a tool call against a file with known landmines.
I committed an actual benchmark to the repo. Ran it on my own 87-file codebase: baseline raw-Read every file = 163k tokens, with engram = 17.7k tokens. 89.1% reduction, 85 of 87 files saved tokens. Reproducible: `npx tsx bench/real-world.ts`. If anyone publishes a comparable benchmark for any other AI memory tool, I'll add it to the README. Haven't found one yet.

Install is `npm i -g engramx && engram init && engram install-hook`. Apache 2.0. https://github.com/NickCirv/engram

Honest question for this sub: what does your CLAUDE.md look like right now? I'm trying to figure out where the line is between "useful context" and "bloat that wastes tokens."