u/Defiant-Future-818

Reducing Context Window Efficiently in MCP — Here’s the Approach
▲ 67 r/mcp

Reducing Context Window Efficiently in MCP — Here’s the Approach

TL;DR: Context bloat in MCP comes from loading too many tools. Use discovery + higher-level abstractions instead.

Why MCP runs into context issues

If you’re not familiar with patterns like tool use or programmatic tool calling, this might give some useful context.

One problem we keep running into with MCP is context bloat.

MCP works nicely when you connect one or two servers. But once you start adding GitHub, Notion, Slack, Gmail, Linear, etc., the model suddenly has to deal with a huge list of tools, schemas, descriptions, parameters, and edge cases.

The impact

At that point, the context window starts getting used for tool definitions instead of the actual task.

The result is usually:

  • too many tools loaded upfront
  • slower tool selection
  • more expensive LLM calls
  • more chances for the model to pick the wrong tool
  • simple workflows turning into long tool-calling loops

Current workarounds

A lot of people already work around this with CLIs.

For example, instead of giving the model 50 GitHub tools, you let it use gh. Instead of exposing every cloud operation as a separate tool, you let it use vercel, supabase, kubectl, aws, etc.

That works because the model does not need every possible action in context. It just needs a smaller programmable interface.

The pattern that seems to be emerging

I think people are moving toward a similar pattern.

Instead of loading hundreds of tools directly into the model, you expose a few higher-level tools like:

  • search available servers
  • list tools for one server
  • inspect a tool schema
  • call a selected tool
  • run a script/progrram which might chain multiple tools and return the final result

How this shows up in practice

FastMCP has also started supporting patterns in this direction with proxying, tool transformation, and metadata around tools. The idea is similar: don’t treat the MCP tool list as a flat thing that must always be dumped into the model. Add a layer that can filter, reshape, or route tools.

Antigravity has another interesting approach. From what I’ve seen, connected MCP servers can look more like a filesystem. If you connect something like Exa, there can be an exa directory with tool names represented like files. Then the model does not call every Exa tool directly from a huge global list. It uses a special routing tool to call the actual MCP server tool behind the scenes.

That is a bit different from normal MCP clients, but the pattern is the same:

make tools discoverable, not always-loaded.

So instead of -> model sees 500 tools → picks one → calls → repeats

You get -> model discovers capability → inspects what it needs → executes → gets result

Where this actually makes a difference

This feels especially useful for tasks like:

  • Multi-step tasks with three or more dependent tool calls e.g find a GitHub issue, check related PRs, and post a Slack update.
  • Filtering, sorting, or transforming tool results.
  • Working on tasks where agent doesn't have to see and reason about intermediate tool results

What I built around this

I’ve been building around this idea too. I made an MCP Assistant server that provides access to 100+ MCP servers like GitHub, Notion, Zapier, Supabase, Exa, etc.

You connect your MCP server at https://mcp-assistant.in/mcp

The mcp server https://api.mcp-assistant.in/mcp uses a ToolRouter which exposes meta-tools for dynamic MCP discovery, plus a CodeMode tool that can execute programs inside a sandbox for workflow execution and result processing. The goal is to avoid expensive LLM tool-calling loops where possible.

Repo: https://github.com/zonlabs/mcp-ts

u/Defiant-Future-818 — 14 hours ago

Most AI agents fail when you give them too many tools because the "system prompt" gets too long and the model gets confused (the "lost in the middle" problem).

I’ve been benchmarking an approach called ToolRouter that dynamically fetches tool schemas only when they are relevant, similar to the approach Claude Code uses internally. I managed to get a 128-tool environment down from 287,000 tokens to just 3,500 tokens of initial context.

https://preview.redd.it/hzlnxp2uvwxg1.png?width=895&format=png&auto=webp&s=ef18932fc5aed49bbaacecb82cdec1f6f6a47191

https://preview.redd.it/q22qhq2uvwxg1.png?width=435&format=png&auto=webp&s=127c5ab5826a7b17abbade49a05d65a153344d56

It’s part of the mcp-ts library I’m building. If you're struggling with agent context limits or high API costs, and If you're already using the AG-UI protocol to bridge your agents to frontends like CopilotKit, this will certainly help you.

Check it out here: https://github.com/zonlabs/mcp-ts
Benchmark results: Link to benchmarks

reddit.com
u/Defiant-Future-818 — 26 days ago