
After 6 months of tuning my Claude Code MCP setup, I found 5 patterns that actually save tokens
I'm a senior backend engineer using Claude Code as my daily driver since November. I added MCP servers, hated my context bar, started instrumenting everything. After ~600 hours of usage I distilled the savings down to five patterns. Calling it the SCOPE rule.
Numbers below are from my own setup (Sonnet, 6 active MCPs, ~110 tools at peak), measured across roughly 4,000 turns.
S - Strip tool descriptions
- Bad: ship the MCP author's marketing copy as-is
- Good: rewrite every tool's description to one sentence, verb-led, action-clear
- Example: "Search across all your Slack channels and DMs to find messages matching natural language queries with full filtering support" → "Search Slack messages by query string"
- Result on my setup: -11k input tokens per cold-start turn. ~30% of total MCP overhead came from description bloat alone.
C - Cap visible tools at 20
- Past 20 tools in context, model accuracy on tool-selection drops measurably
- My eval (200 fixed queries): 94% accuracy at 18 tools, 71% accuracy at 110 tools
- The "fix" isn't a smarter model. It's fewer visible tools. Past 20, you need a gateway pattern.
- Result: 23-point accuracy improvement, also tokens drop because only top-K loads.
O - One-scope-per-purpose
--scope userputs a server in every Claude session forever. Most don't belong there.- Use
--scope projectfor project-specific work,--scope useronly for cross-cutting (filesystem, git, GitHub) - My setup: 6 active MCPs across 4 different scopes. Any single Claude Code session sees 2-3 of them.
- Result: -8k input tokens per turn on average, because most sessions don't load all 6 servers.
P - Prefer keyword ranking over embeddings
- Cosine similarity over tool descriptions sounds smart, fails on short structured text
- My eval (200 queries, same as above): BM25 = 81% top-1, semantic embeddings = 64%, hybrid = 78%
- This is opposite of document RAG defaults. Tool descriptions are not paragraphs.
- Result: better selection accuracy AND no embedding API cost AND offline ranking.
E - Eject Docker if you can
- If your gateway runs as a separate service (Docker, sidecar, sidecar-as-a-service), you've added an ops surface you don't need
- In-process libs that compile-in (Rust + NAPI-RS in the case I'm running, Ratel) collapse this to zero ops
- Result on my setup: no service to monitor, no port to expose, install is
pnpm add -g @ ratel-ai/cli+ one command (ratel mcp import).
Worked example from last week
Before SCOPE: cold start 41k input tokens. Tool-selection accuracy on a known-correct query set: 71%. Average response time 4.8 seconds.
After SCOPE: cold start 4.1k input tokens. Tool-selection accuracy: 94%. Average response time 1.9 seconds.
10x token reduction, 23-point accuracy gain, 2.5x latency improvement. Numbers from my own usage, not a vendor benchmark.
Notes on the math
These results are specific to a Claude Code + MCP setup. If you're not using MCP, the description-strip and gateway points still apply (any agent loop with N tools has the same problem). The scope point is Claude-Code-specific.
The first three are free. Anyone with ~/.claude.json write access can ship them today. The fourth and fifth need either a gateway library or rolling your own ranking.
I'd be curious what other people are measuring, especially anyone running 5+ MCPs in production. What's your cold-start token cost?