r/modelcontextprotocol

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)
▲ 326 r/modelcontextprotocol+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago

How are enterprise teams enforcing ai agent governance before regulation

What's the actual pattern for enforcing which agent can connect to which mcp server in an enterprise context?

Most setups I've seen use static credentials and trust the model layer not to do anything unexpected but that's not governance, it's optimism. I haven't found documentation on what a proper enterprise pattern looks like: per-agent credentials to specific tools, enforced at the connection level rather than the prompt level, with an audit trail that can answer which agent called which tool at what time with what input.

Static credentials are clearly wrong, what can replace them in practice?

reddit.com
u/Rich-Ambition-3111 — 5 days ago
▲ 7 r/modelcontextprotocol+3 crossposts

MCP Boundary v0.1.3 - boundary checks for MCP tool calls with real side effects (last build was broken, now fixed. Agent loop stopped in the shown example)

Follow-up to my post from two weeks ago. People downloaded it, but that build (v0.1.0) was broken - it likely wouldn't even start, and we didn't catch it at the time. It's fixed now (v0.1.3). Reposting for anyone who tried that version and wrote it off - it actually runs now.

>The problem we are trying to solve is: tool access is not the same as impact permission. A model or agent may be allowed to call a tool, but that does not always mean this specific write, delete, update, or retry should happen now.

A few things you can do with it:

  • restrict arguments, not whole tools (for example, allow sending only to approved recipient domains)
  • bind writes to observed state, so a write does not run if the world changed since the read
  • see every call, decision, reason, and outcome in a local dashboard

MCP Boundary checks each call against your policy and the current state before it hits the real system. It then allows it, blocks it, or asks the agent to refresh state - and when it blocks, the agent gets a structured reason it can act on, not just an error.

It runs locally, and wraps your existing command-based (stdio) MCP servers within 2 minutes.

It is not an enterprise gateway, a DLP system, or a prompt-injection detector, and it only covers calls routed through it.

I'm looking for feedback from people running MCP workflows with side effects - especially where the policy model is too strict or too loose for your setup.

Site: https://mcpboundary.com

Repo: https://github.com/impact-boundary-labs/MCPBoundary

u/madiamo — 7 days ago