u/Background-Job-862

r/mlops r/MCPservers r/LLM_Gateways r/LocalLLM r/LLMDevs r/mcp

Is there actually a “best” MCP gateway yet, or is everyone just solving different halves of the problem?

Spent the last few weeks trying to answer this for our own stack and came away thinking the question itself is slightly wrong right now. Docker’s mcp gateway is genuinely nice for local dev - container isolation per server, credential handling baked into docker desktop but it’s not really built for cross-team, crossregion enterprise governance. The community mcp-gateway-registry project is solid if you want to bring your own keycloak/entra OAuth and don’t mind assembling the pieces yourself. Kong shipped an mcp layer as part of their broader ai gateway, which makes sense if mcp is one traffic type among several you already govern with Kong, but feels heavy if mcp is your only concern. Truefoundry approaches it as identity-and-token-scoping first, resolving agent identity separately from user identity and minting scoped tokens per mcp server which matters a lot once you have agents acting on behalf of users, less if you’re still single-user, (this is the one I ended up using for my team)

The honest answer is, the “best” depends on whether your problem is discovery (which servers exist), governance (who can call what), or just getting something running fast for a demo. I think, what problem people are others facing and how are you actually optimizing for that seems to determine the right answer more than any feature checklist does..

reddit.com

u/Background-Job-862 — 2 days ago

▲ 3 r/mcp

MCP proxy vs MCP gateway - I spent 3 months on the wrong one first

When we started with MCP, I set up an MCP proxy and called it done. It handled the stdio-to-HTTP transport gap, agents could reach the servers, everything worked. I didn't think much about the distinction between a proxy and a gateway.

Three months later we had a contractor incident (their server access wasn't revoked after they left), a near-miss with a write tool that had too-broad permissions, and a security audit that asked for a tool call log we couldn't produce. None of those are proxy problems, they're all governance problems.

Here's the distinction that eventually clicked for me,

A proxy answers: can this request reach its destination? It handles transport. stdio → HTTP, WebSocket, SSE. It forwards bytes. It doesn't know what a tool call is, doesn't check who's authorized to make it, doesn't log it in any structured way.

A gateway answers: should this request be allowed to happen and is there a record that it did? Same transport capabilities, but adds: identity (who is calling), RBAC (are they allowed to call this tool), guardrails (inspect what comes back before the agent sees it), audit trail (structured log with user identity, parameters, result).

The proxy is the right answer when your only problem is transport - you have a local stdio server and need to expose it over HTTP. Once you have multiple users, multiple agents, access control requirements, or an audit requirement, you've crossed into gateway territory.

What tripped us: we were solving a transport problem when we actually had a governance problem. Took us too long to realise the proxy wasn't going to grow into what we needed.

Has anyone else gone through this? is this common for others as well it to start with a proxy and then realise you need something else...

reddit.com

u/Background-Job-862 — 5 days ago

▲ 14 r/mlops

How we finally got real observability into our LLM stack

For the first 4 months of running LLMs in production, our observability was mostly vibes. App responded → good. App timed out → bad, we basically had no insight into what was actually happening.
Soon we realised setting up proper LLM observability was one of the highest-roi things we did. Here's what I learned setting up this,

So, our setup is primarily, a gateway-level tracing with OpenTelemetry export to our existing stack. Every LLM request now has: model called, input tokens, output tokens, latency (wall time + time-to-first-token + inter-token latency), cost, and metadata tags for user/team/feature/environment. These traces land in our Datadog alongside regular app traces.
For agent workflows, we capture the full trace: which tools were called, in what order, what the model's intermediate reasoning was (for models that support it), final response. This is invaluable for debugging.

A few really important insights this revealed for us,

One team was calling GPT-4o for a task that needed basic extraction. Model was 10x more expensive than needed and actually slower, so we moved to a smaller model, cost dropped massively to about 85% for that feature
We had a semantic cache enabled but with the wrong similarity threshold. Half our "cache hits" were on requests that shouldn't have matched, tuned this and improved cache hit rate meaningfully
One of our RAG pipelines had an embedding call that was adding 400ms every time. Not obvious without per-step latency, fixed it by caching the embeddings.
Our "prod" and "dev" environments were sharing rate limit quotas. Dev was sometimes throttling prod - added environment-based quota separation.

We use truefoundry's gateway for the OTEL export and it pipes directly into our existing datadog setup without a separate observability vendor. But honestly the specific tool matters less than the pattern: get your LLM traces into whatever stack your team already uses for the rest of your services. The value is in correlation, not in a standalone llm dashboard nobody checks

The single question worth asking right now is if someone asked you why your p95 llm latency spiked on a random Tuesday, could you answer it?

If not, that's the gap, what's everyone else exporting to? are you guys using datadog/grafana or using dedicated llm observability tools like langfuse or arize?

reddit.com

u/Background-Job-862 — 5 days ago

▲ 10 r/mlops

What does llm governance mean in practice?

LLM governance is one of those terms that gets used constantly these days, but with wildly different meanings depending on who's talking. I've heard it mean everything from we have a content filter to we have a full compliance program with audit trails and model risk management

We've been building this out for the past year. So, here's how I'd now break it down into layers from my understanding that actually map to things you build:

Layer 1: Access control who can call which models, with what keys, with what limits. This is the gateway layer - virtual keys, per-team rate limits, budget caps. Most teams start here because cost incidents force it.

Layer 2: Content governance what can go in and come out. Input guardrails (prompt injection detection, PII scrubbing before data leaves your perimeter), output guardrails (content moderation, safety checks). The key design question: validate-only (flag and block) vs mutate (modify the content). Both are useful for different cases.

Layer 3: Audit and observability every request logged with enough context to answer: who made it, what model, what it cost, what the prompt contained, what the response was. The hard part isn't capturing the data - it's making it queryable in a format a compliance team can actually use, not raw JSON logs.

Layer 4: Model risk management which models are approved for which use cases. Who decides when a new model goes on the approved list. What happens when a model is deprecated. This is the most organization-specific layer and usually the last one teams formalize.

Layer 5: Agent and tool governance if you're running agents that call tools via MCP: which agents can call which tools, under which user identity, with what audit trail per invocation. This layer didn't exist two years ago and most governance frameworks haven't caught up to it.

Most teams I've talked to have layer 1 in some form, maybe layer 2, and are improvising on 3-5... what layer is causing the most pain in your current setup?

reddit.com

u/Background-Job-862 — 7 days ago

▲ 7 r/MCPservers+1 crossposts

What are you guys using these days to manage multiple MCP servers in a production agentic setup?

Building a multi-agent system where agents need access to internal knowledge base, slack, github, our crm, and a few internal APIs wrapped as mcp servers.

In dev this was fine because there was just one server, one agent, everything local but now that we are moving towards production, I'm realizing the MCP management problem is even more real

How do you handle service discovery? agents need to know what tools are available without hardcoding lists
What do you use for auth? especially for servers that need to act on behalf of specific users like say, posting to Slack as a real user identity
Do you have any tooling for observability over mcp calls? my agent sometimes makes 20+ tool calls per task and I have no insight into which ones are slow or failing

I've been looking at mcp gateways as the abstraction layer instead of wiring agents directly to mcp server, so far I've looked at docker mcp toolkit, obot, and truefoundry. They all seem to take slightly different approaches, and truefoundry looks like the one we'll likely move forward with since it seems to be most complete for our needs as it combines registry, rbac, oauth, and observability in one place.
how are you guys handling this? are you using a gateway? or just connecting agents directly to mcp servers?

reddit.com

u/Background-Job-862 — 7 days ago

▲ 2 r/LocalLLM

We moved off LiteLLM after 8 months. Sharing what our experience was and what actually pushed us over the edge.

We were LiteLLM users for a while and generally happy with it. Open source, MIT license, broad provider support, it was the right call for where we were.

But then, the move happened because of three things compounding:

1. The YAML config problem at team scale when it was just me and two other engineers, the config was manageable. Once we had four squads modifying routing rules, we had merge conflicts on the LiteLLM config file twice in one week. There's no real access model for "team A can manage their routing config but can't touch team B's." It's one file. We tried splitting it but that created its own sync issues

2. SSO we needed Okta. That's behind the enterprise license. We were already paying for several other tools and adding another enterprise license just for SSO on the gateway felt off, especially when that SSO cost was unlocking features that should arguably be baseline.

3. The Redis incident LiteLLM uses Redis for distributed rate limiting and our Redis had a brief availability issue during a load test. The rate limits failed open, the requests went through without enforcement. In our case it was a test environment so nothing bad happened. But it made us think hard about what happens when this occurs in production during a cost spike. The safety net isn't there when you most need it.

we evaluated a few things before moving: portkey, helicone, kong, truefoundry and a couple others and eventually landed on truefoundry, happy to share notes on any of them if useful.

Has any of this pushed you off LiteLLM as well, if you've made that call? And if you've stayed, how have you handled the config scaling problem?

reddit.com

u/Background-Job-862 — 10 days ago

▲ 10 r/MCPservers+1 crossposts

We went from "give everyone access to all MCPs" to proper governance - here's how

Six months ago our MCP setup was: one api key per server, everyone on the team had access to everything, no logs. Classic startup "move fast" situation.

Then someone in our team accidentally triggered a Jira bulk-edit tool call via an agent and we had....chaos. Nothing catastrophic, thankfully, but it surfaced that we had an important realisation - zero guardrails on what agents could do with our tools.

What we have built since then:

Centralized MCP registry all MCP servers register in one place. Agents and users discover available tools through that registry rather than hardcoded lists. When we add a new server, it's immediately available to the right people.

RBAC per server, per team eng gets access to GitHub and Sentry MCPs. Support gets Zendesk and Confluence. Finance gets their specific internal tools. Access is managed centrally, not per-server.

OAuth 2.0 for server-level auth we were using Okta already. Integrated that with the MCP gateway so agent requests are authenticated against real user identities, not shared service account keys. Huge for compliance.

Tracing every call every tool call now has: who triggered it, which agent, which tool, input/output, latency, whether it succeeded. This is non-negotiable if you care about auditability.

We tried quite a few approaches and eventually landed on TrueFoundry's MCP Gateway because it matched what we needed around authentication, access controls, and observability. The migration itself took about a week, including moving over our existing servers.

The bigger lesson for us, though, wasn't about the specific tool. It was that once agents start interacting with real systems, MCP stops being just an integration problem and become a governance problem. The protocol makes connecting tools easy and figuring out who can use those tools, under what conditions, and how you audit what happened afterward is where most of the operational work begins.

How are you guys handling MCP access control and auditability? Are you managing it centrally, or still doing it server-by-server?

reddit.com

u/Background-Job-862 — 11 days ago

▲ 20 r/LLM_Gateways+2 crossposts

What's everyone actually using for an AI gateway in prod? Tired of duct-taping LiteLLM together

We're a mid-size eng team, actively building on LLMs. Started with LiteLLM as a proxy because it was the obvious free option and it worked fine for a while.

Problem is we're now at a point where:

multiple teams hitting the same openAI/anthropic keys with zero visibility into who's burning what
had an incident where one team's batch job ate through our entire monthly quota in 4 hours
no clean way to do fallbacks - when Anthropic had that streaming outage a few months back we were just down

We've looked at portkey (feels very saas-y, got acquired recently which introduces uncertainty around future priorities, roadmap alignment, which is worth considering for long-term infrastructure decisions), helicone (good observability but routing feels thin), and briefly at building something in-house (our infra lead said absolutely not).

Also came across Truefundry seems more enterprise-focused, claims sub-3ms overhead and they have priority-based fallback chains built in

What is everyone else using? Any other options I'm missing?

Not looking for the "just use this" response here, just trying to understand the tradeoffs. We're on AWS, will need VPC deployment eventually.

reddit.com

u/Background-Job-862 — 11 days ago

▲ 24 r/MCPservers+1 crossposts

How we secured 15 MCP servers without losing our minds - auth setup that works

Eight months ago our MCP auth story was: shared API key in a .env file, everyone had access to everything, fingers crossed nothing bad happens.

Two near-misses later (one agent almost deleted production data via a misconfigured write tool, one case of a contractor's MCP access not being revoked after they left), and then we got serious about it.

Here's where we landed after months of evaluation:

One API key for everything This sounds counterintuitive but hear me out. Instead of each MCP server having its own key management, we route everything through a central gateway. Agents get one gateway key. That key's permissions are defined in the gateway, not in 15 different server configs. When an agent's access needs to change, we change it in one place.

RBAC at the tool level We can say "Agent A can list_channels in Slack but can't send_message." That level of control made a huge difference.

OAuth for user-delegated actions For actions that should run as a real user (like posting to Slack), we use OAuth 2.0 with Okta. The gateway handles token exchange and refresh, so agents never deal with OAuth directly.

Audit logs for every call Every MCP tool invocation is logged - agent, user, tool, parameters, response, and timestamp. Security wanted it, but it's also become one of our best debugging tools.

We looked at a few different options while evaluating this - Cloudflare's MCP Gateway, Kong AI Gateway, and Portkey all came up during the process. They each solved parts of the problem, but for us the priority wasn't just exposing MCP servers. We needed centralized authentication, fine-grained RBAC, Okta integration, and audit logs in one place since we were already standardizing our AI infrastructure.

We ended up going with TrueFoundry's MCP Gateway because it checked those boxes without requiring us to stitch together multiple systems. The Okta integration took about a day to configure, and setting up RBAC across our MCP servers took another day or so. After that, onboarding new agents and revoking access became a one-place change instead of updating permissions across every server.

The biggest lesson for me, just define your authentication and authorization model before you have dozens of MCP servers. Retrofitting it later is a lot more painful than getting it right upfront, FR

reddit.com

u/Background-Job-862 — 17 days ago

▲ 0 r/LocalLLM+1 crossposts

My team evaluated 5 AI gateways for deployment - here's my honest breakdown

We spent about 6 weeks properly evaluating options before committing. Our requirements: VPC deployment (data can't leave our cloud), unified API for 10+ models, per-team rate limiting + cost attribution, auditability for compliance, and <5ms gateway overhead.

Quick breakdown of what me and my team found:

LiteLLM It was great for getting started, huge model support, genuinely good open-source project. Falls apart when you need enterprise auth (RBAC is bolted on), rate limiting per user is painful to configure, and at scale the Python proxy starts showing latency issues. Amazing for solo devs / small teams.

Portkey Their versioned prompts UI is legitimately good. Rate limiting and RBAC feel secondary though, and we couldn't get the VPC deployment to work as smoothly as advertised within our timeline.

Helicone If you just want to see what's happening with your LLM calls, nothing beats it. Routing/fallback capabilities are thin. Not the right fit if governance is your primary concern.

Kong AI Gateway Powerful if you're already a Kong shop. Steep learning curve. Felt like it was retrofitting AI features onto an API gateway, not built from the ground up for LLMs.

TrueFoundry This is what we ended up going with. The key differentiators for us was proper VPC/on-prem deployment, along with data sovereignty, their priority-based routing with fallback chains do actually work, latency overhead was sub-3ms in our testing (we verified it), and RBAC + budget limits. The observability covers what we need. Gartner apparently featured them in a 2026 report on optimizing GenAI costs which was a nice external validation signal for us to go throgh the procurement process.

Happy to answer questions on any of the above.

reddit.com

u/Background-Job-862 — 11 days ago