
ContextForge: a local proxy that cut my Claude Code token usage by up to 72%
Hi everyone,
I’ve been working on a project to address a specific frustration I had with AI coding agents: token waste. I noticed that agents often burn a significant portion of the context window just re-reading the same files to find functions or re-discovering the repository structure on every turn.
I built ContextForge — a local proxy and CLI that acts as a "codebase-aware" runtime.
How it works
ContextForge sits between your agent (like Claude Code) and your LLM provider. Instead of letting the agent "guess" where files are, it provides local intelligence:
- Local AST Graph: It indexes your repo using native C++ parsing into a local SQLite graph. When the agent needs to find a symbol, the proxy handles the lookup locally.
- Context Optimization: It applies a compression pipeline that skeletonizes older file history (keeping only signatures) and vaults oversized responses (like lockfiles), replacing them with pointers.
- Protocol Translation: It translates Anthropic requests into OpenAI format, which allows you to run Claude Code against Ollama/OpenAI-compatible models with full streaming support.
Case Study: "Soft-Delete" Feature
To test the architecture, I implemented a complex feature in an Express.js backend using an Ollama model. I compared a raw session (Passthrough) against one routed through ContextForge.
| Metric | Passthrough Mode | ContextForge Mode | Difference |
|---|---|---|---|
| LLM round-trips | 41 | 14 | 66% fewer |
| Input tokens | 1,632,266 | 444,092 | 72.8% fewer |
| Output tokens | 1,632,266 | 384,033 | 76.5% fewer |
| Session Compression | — | 60,059 (13.5%) | — |
Understanding the Metrics:
- Workflow Savings (72.8%): These are tokens that were never generated because the tooling changed the workflow. The model used the local graph to find symbols instead of "guessing" via file searches, solving the task in 14 steps instead of 41.
- Session Compression (13.5%): This is the actual text removed from the prompts within the session via skeletonization and deduplication.
Note: These results are from a specific, repository-heavy task. Savings vary significantly based on the work—long refactors benefit most, while short chats benefit much less.
Get Started
I've just released v1.0.3 and I'm looking for feedback from the community
Install: npm i -g @anuj612/contextforge
GitHub: https://github.com/anujkushwaha612/ContextForge
Note: No compiler needed — ships with prebuilt native binaries for Windows, macOS, and Linux via npm.
I’d love to hear your thoughts on the project and to tackle the new bugs and issues coming forward.