u/BoxLegitimate9271

I've been running 5+ MCP servers across multiple agent sessions every day for months. My MCP Server journey has been a bit of a roller-coaster: first I loved them (too much probably according to my token usage), then I joined the "MCP is dead" gang, but now I've finally landed in being just pragmatic. MCP is great in concept but, as most of us know, there are some rough edges that keep biting. Figured it is time to share what I kept running into and what I ended up building (and using) to fix it.

1. Tool bloat in the system prompt

Connect 5 MCP servers and suddenly your agent has 50-80 tool definitions in its system prompt/context. Then every API turn read all of these over and over again, even when the agent only needs 2. Thousands of tokens compounding in the session just to list tools. This is a well-known problem, some clients like Claude Code let you filter tools per server, but most don't (Codex, Cursor, Windsurf, Claude Desktop), and even Claude Code's filtering is manual and static.

2. Sequential calls eat your context alive

Every tool call adds ~75 tokens of structural overhead plus ~150 tokens of the model going "now I'll fetch the next one" to nobody. 40 of those in a session and thousands of tokens are just the agent talking to itself. Also an issue for CLI tool calls, MCP is actually more efficient, but still the context fills up with filler, sessions have to be compacted earlier, and output quality tanks as the model re-reads junk from 40 turns ago.

3. MCP server restarts kill your session

An MCP server disconnects, updates, tool list changes, or crashes. In Codex and other clients, that means restarting the entire agent session, even in Claude Code you have to reconnect with /mcp to get access to updated tools. Context, reasoning, progress: all gone. This happens way more often than it should and is a really annoying productivity-killer.

4. Process explosion with multiple sessions

6 agent sessions x 5 MCP servers = 30 stdio processes and 4+ GB of RAM. Each session spawns its own copy of every server. Most of them sitting there idle.

5. Existing solutions are server-side

Tools like Bifrost (which is great btw) touch on some of this, but they're hosted products or self-hosted infrastructure. Not something you're going to deploy just to get tool scoping or call batching. There's no local control plane that sits between your agents and your MCP servers.

What I built: callmux, a local MCP multiplexer that sits between any agent and any MCP server. It wraps your existing MCP server configs transparently (no rewiring) or runs as a standalone shared daemon. It is pretty flexible, lots of features, but most are optional, have sane default and tweakable through configuration.

How it fixes each issue:

Tool bloat - Supports two modes: either Tool scope filtering or a Meta-only mode which hides all downstream tools, exposes 11 meta-tools. The agent discovers tools via semantic search and calls them through callmux_call. System prompt size stays fixed no matter how many servers you add. Also gives per-server tool whitelisting to any client, even ones without native support.
Sequential calls - callmux_parallel, callmux_batch, callmux_pipeline. 10 sequential calls become 1. My tool calls dropped to avg to ~15% of the original count, about 1,350 tokens saved per batch of 7.
Session death - A small (optional) callmux Stdio bridge that auto-reconnects when downstream servers hiccup or tools change. The agent session never notices. Hot-reload server configs without restarting anything. Especially wonderful when developing and testing your own MCP servers, it just works.
Process explosion - Shared server mode (optional): run callmux once, all sessions connect over HTTP. 30 processes down to 6, shared cache across sessions.
Local, not hosted - MIT, npm install -g callmux or just npx. The whole point is: Your machine, your data.

Other stuff in there (optional of course): interactive setup wizard, response caching with TTL, read-only live dashboard, recipes (multi-step workflows you define once and call by name), dry-run mode, enterprise security (auth, RBAC, rate limiting, CIDR allowlists, audit logging, Prometheus metrics, OIDC), config hot-reload, systemd/launchd daemon install, file references for long arguments, and result pagination for large responses.

Callmux works with Claude Code, Codex, Claude Desktop, Cursor, Windsurf, pretty much anything that speaks MCP stdio or HTTP.

npx -y callmux setup

I hope that you find it as useful as I do, and contributions are welcome. Happy to answer questions and hear what you think of this.

5 practical problems with MCP right now (and a local tool that fixes them)