
Reducing Context Window Efficiently in MCP — Here’s the Approach
TL;DR: Context bloat in MCP comes from loading too many tools. Use discovery + higher-level abstractions instead.
Why MCP runs into context issues
If you’re not familiar with patterns like tool use or programmatic tool calling, this might give some useful context.
One problem we keep running into with MCP is context bloat.
MCP works nicely when you connect one or two servers. But once you start adding GitHub, Notion, Slack, Gmail, Linear, etc., the model suddenly has to deal with a huge list of tools, schemas, descriptions, parameters, and edge cases.
The impact
At that point, the context window starts getting used for tool definitions instead of the actual task.
The result is usually:
- too many tools loaded upfront
- slower tool selection
- more expensive LLM calls
- more chances for the model to pick the wrong tool
- simple workflows turning into long tool-calling loops
Current workarounds
A lot of people already work around this with CLIs.
For example, instead of giving the model 50 GitHub tools, you let it use gh. Instead of exposing every cloud operation as a separate tool, you let it use vercel, supabase, kubectl, aws, etc.
That works because the model does not need every possible action in context. It just needs a smaller programmable interface.
The pattern that seems to be emerging
I think people are moving toward a similar pattern.
Instead of loading hundreds of tools directly into the model, you expose a few higher-level tools like:
- search available servers
- list tools for one server
- inspect a tool schema
- call a selected tool
- run a script/progrram which might chain multiple tools and return the final result
How this shows up in practice
FastMCP has also started supporting patterns in this direction with proxying, tool transformation, and metadata around tools. The idea is similar: don’t treat the MCP tool list as a flat thing that must always be dumped into the model. Add a layer that can filter, reshape, or route tools.
Antigravity has another interesting approach. From what I’ve seen, connected MCP servers can look more like a filesystem. If you connect something like Exa, there can be an exa directory with tool names represented like files. Then the model does not call every Exa tool directly from a huge global list. It uses a special routing tool to call the actual MCP server tool behind the scenes.
That is a bit different from normal MCP clients, but the pattern is the same:
make tools discoverable, not always-loaded.
So instead of -> model sees 500 tools → picks one → calls → repeats
You get -> model discovers capability → inspects what it needs → executes → gets result
Where this actually makes a difference
This feels especially useful for tasks like:
- Multi-step tasks with three or more dependent tool calls e.g find a GitHub issue, check related PRs, and post a Slack update.
- Filtering, sorting, or transforming tool results.
- Working on tasks where agent doesn't have to see and reason about intermediate tool results
What I built around this
I’ve been building around this idea too. I made an MCP Assistant server that provides access to 100+ MCP servers like GitHub, Notion, Zapier, Supabase, Exa, etc.
You connect your MCP server at https://mcp-assistant.in/mcp
The mcp server https://api.mcp-assistant.in/mcp uses a ToolRouter which exposes meta-tools for dynamic MCP discovery, plus a CodeMode tool that can execute programs inside a sandbox for workflow execution and result processing. The goal is to avoid expensive LLM tool-calling loops where possible.