u/AbjectBug5885

After 6 months of tuning my Claude Code MCP setup, I found 5 patterns that actually save tokens

After 6 months of tuning my Claude Code MCP setup, I found 5 patterns that actually save tokens

I'm a senior backend engineer using Claude Code as my daily driver since November. I added MCP servers, hated my context bar, started instrumenting everything. After ~600 hours of usage I distilled the savings down to five patterns. Calling it the SCOPE rule.

Numbers below are from my own setup (Sonnet, 6 active MCPs, ~110 tools at peak), measured across roughly 4,000 turns.

S - Strip tool descriptions

  • Bad: ship the MCP author's marketing copy as-is
  • Good: rewrite every tool's description to one sentence, verb-led, action-clear
  • Example: "Search across all your Slack channels and DMs to find messages matching natural language queries with full filtering support" → "Search Slack messages by query string"
  • Result on my setup: -11k input tokens per cold-start turn. ~30% of total MCP overhead came from description bloat alone.

C - Cap visible tools at 20

  • Past 20 tools in context, model accuracy on tool-selection drops measurably
  • My eval (200 fixed queries): 94% accuracy at 18 tools, 71% accuracy at 110 tools
  • The "fix" isn't a smarter model. It's fewer visible tools. Past 20, you need a gateway pattern.
  • Result: 23-point accuracy improvement, also tokens drop because only top-K loads.

O - One-scope-per-purpose

  • --scope user puts a server in every Claude session forever. Most don't belong there.
  • Use --scope project for project-specific work, --scope user only for cross-cutting (filesystem, git, GitHub)
  • My setup: 6 active MCPs across 4 different scopes. Any single Claude Code session sees 2-3 of them.
  • Result: -8k input tokens per turn on average, because most sessions don't load all 6 servers.

P - Prefer keyword ranking over embeddings

  • Cosine similarity over tool descriptions sounds smart, fails on short structured text
  • My eval (200 queries, same as above): BM25 = 81% top-1, semantic embeddings = 64%, hybrid = 78%
  • This is opposite of document RAG defaults. Tool descriptions are not paragraphs.
  • Result: better selection accuracy AND no embedding API cost AND offline ranking.

E - Eject Docker if you can

  • If your gateway runs as a separate service (Docker, sidecar, sidecar-as-a-service), you've added an ops surface you don't need
  • In-process libs that compile-in (Rust + NAPI-RS in the case I'm running, Ratel) collapse this to zero ops
  • Result on my setup: no service to monitor, no port to expose, install is pnpm add -g @ ratel-ai/cli + one command (ratel mcp import).

Worked example from last week

Before SCOPE: cold start 41k input tokens. Tool-selection accuracy on a known-correct query set: 71%. Average response time 4.8 seconds.

After SCOPE: cold start 4.1k input tokens. Tool-selection accuracy: 94%. Average response time 1.9 seconds.

10x token reduction, 23-point accuracy gain, 2.5x latency improvement. Numbers from my own usage, not a vendor benchmark.

Notes on the math

These results are specific to a Claude Code + MCP setup. If you're not using MCP, the description-strip and gateway points still apply (any agent loop with N tools has the same problem). The scope point is Claude-Code-specific.

The first three are free. Anyone with ~/.claude.json write access can ship them today. The fourth and fifth need either a gateway library or rolling your own ranking.

I'd be curious what other people are measuring, especially anyone running 5+ MCPs in production. What's your cold-start token cost?

u/AbjectBug5885 — 8 days ago

Why is every "context layer" tool lying about token savings?

I've been shipping agents for a year and a half. Lately every other launch is a "context layer" or "MCP optimizer" promising 70-90% token cuts.

I've installed five of them. Same story:

  • README chart with no methodology
  • "Benchmark code coming soon"
  • The savings only show up on the demo corpus, not on my actual Claude Code with 6 MCP servers and 140-something tools

If your tool actually cuts tokens at scale, ship the corpus, the queries, the seed, the model, the cost. Anything else is a screenshot.

I want to find one of these that works. So far receipts from zero of them. Anyone seen a benchmark that survives sniff-testing?

reddit.com
u/AbjectBug5885 — 10 days ago