u/AbjectBug5885

I'm a senior backend engineer using Claude Code as my daily driver since November. I added MCP servers, hated my context bar, started instrumenting everything. After ~600 hours of usage I distilled the savings down to five patterns. Calling it the SCOPE rule.

Numbers below are from my own setup (Sonnet, 6 active MCPs, ~110 tools at peak), measured across roughly 4,000 turns.

S - Strip tool descriptions

Bad: ship the MCP author's marketing copy as-is
Good: rewrite every tool's description to one sentence, verb-led, action-clear
Example: "Search across all your Slack channels and DMs to find messages matching natural language queries with full filtering support" → "Search Slack messages by query string"
Result on my setup: -11k input tokens per cold-start turn. ~30% of total MCP overhead came from description bloat alone.

C - Cap visible tools at 20

Past 20 tools in context, model accuracy on tool-selection drops measurably
My eval (200 fixed queries): 94% accuracy at 18 tools, 71% accuracy at 110 tools
The "fix" isn't a smarter model. It's fewer visible tools. Past 20, you need a gateway pattern.
Result: 23-point accuracy improvement, also tokens drop because only top-K loads.

O - One-scope-per-purpose

--scope user puts a server in every Claude session forever. Most don't belong there.
Use --scope project for project-specific work, --scope user only for cross-cutting (filesystem, git, GitHub)
My setup: 6 active MCPs across 4 different scopes. Any single Claude Code session sees 2-3 of them.
Result: -8k input tokens per turn on average, because most sessions don't load all 6 servers.

P - Prefer keyword ranking over embeddings

Cosine similarity over tool descriptions sounds smart, fails on short structured text
My eval (200 queries, same as above): BM25 = 81% top-1, semantic embeddings = 64%, hybrid = 78%
This is opposite of document RAG defaults. Tool descriptions are not paragraphs.
Result: better selection accuracy AND no embedding API cost AND offline ranking.

E - Eject Docker if you can

If your gateway runs as a separate service (Docker, sidecar, sidecar-as-a-service), you've added an ops surface you don't need
In-process libs that compile-in (Rust + NAPI-RS in the case I'm running, Ratel) collapse this to zero ops
Result on my setup: no service to monitor, no port to expose, install is pnpm add -g @ ratel-ai/cli + one command (ratel mcp import).

Worked example from last week

Before SCOPE: cold start 41k input tokens. Tool-selection accuracy on a known-correct query set: 71%. Average response time 4.8 seconds.

After SCOPE: cold start 4.1k input tokens. Tool-selection accuracy: 94%. Average response time 1.9 seconds.

10x token reduction, 23-point accuracy gain, 2.5x latency improvement. Numbers from my own usage, not a vendor benchmark.

Notes on the math

These results are specific to a Claude Code + MCP setup. If you're not using MCP, the description-strip and gateway points still apply (any agent loop with N tools has the same problem). The scope point is Claude-Code-specific.

The first three are free. Anyone with ~/.claude.json write access can ship them today. The fourth and fifth need either a gateway library or rolling your own ranking.

I'd be curious what other people are measuring, especially anyone running 5+ MCPs in production. What's your cold-start token cost?

After 6 months of tuning my Claude Code MCP setup, I found 5 patterns that actually save tokens

Why is every "context layer" tool lying about token savings?