You might be the last PM in your company

I build the backend plumbing for AI agents — MCP servers and gateways — and the more I do it, the more I think the most underrated fact about the PM job is this: in most orgs, the PM *is* the integration layer.

Engineers have a name for the mess: the M×N problem. M tools × N workflows, each pair needing its own bridge. Someone has to make all those tools talk to each other — and in practice that someone was never the platform team. It was you.

Every time you pull a retention number from your analytics tool, hold it in your head, retype it into a PRD, then break that PRD into tickets by hand — you're manually closing an M×N connection. You were cheaper and more flexible than building the bridge, so the bridge never got built.

MCP (Model Context Protocol) is the first thing I've seen that actually dissolves that layer: one assistant reaching into your whole stack with live access and doing the reconciliation itself. The part I find genuinely interesting isn't “less copy-paste” — it's that a PRD can be grounded in live data instead of a snapshot that's already stale by Friday.

But I'll be honest about the half the hype skips, because I deal with it on the build side: in practice these connections drop mid-task, and stacking too many servers actually makes the assistant *worse*, not better (each one eats into the context the model has to reason with). There's a fix, but most people haven't hit that wall yet.

Genuinely curious how this sub sees it: for those wiring Claude/ChatGPT connectors into your PM stack — is it actually saving you time, or is the setup and re-auth overhead still eating the gains?

(I wrote a longer breakdown of the PM angle — it's my own Medium post. Happy to drop the link in a comment.https://medium.com/p/0a29445370c2

reddit.com
u/SnooPuppers2477 — 5 days ago
▲ 1 r/AISystemsEngineering+1 crossposts

AI Agents in Production: The Failure Modes Nobody Puts in the Demo

Hey everyone,

I’ve spent the last month building and shipping agentic systems into production. If there’s one thing I’ve realized, it’s that the gap between a flashy Twitter/X demo and a stable, secure production agent is a mile wide.

I put together a deep-dive guide breaking down the architectural realities, high-ROI use cases, and the specific security risks that only surface after you ship.

Here is the TL;DR on what happens when agents meet the real world:

1. Chatbots vs. Agents (The Power to Act)

The only difference between a chatbot and an AI agent is one word: act. An LLM generates—it takes text and returns text. An agent takes that output and runs with it (a tool call, a database query, an email). The model is the mastermind, but tools give it hands. The moment software gets hands, your entire design, testing, and security paradigm has to change.

2. The Ideal Use Case Formula

Agents aren't a silver bullet for everything. They thrive where the cost of human attention is high, but the cost of a mistake is low.

  • High ROI: Operational automation, continuous synthesis/monitoring, support deflection, and repository hygiene.
  • The Trap: Building an agent to reason in a vacuum. If it isn't checking its work against environmental ground truth (real tool results, actual error messages) at every turn of its perceive-decide-act loop, it will drift.

3. The New Attack Surface (Securing a decision-maker)

Unlike traditional software, you're no longer just securing an application—you're securing a decision-maker with credentials. The OWASP Top 10 for LLM Applications highlights exactly why teams are quietly shutting down their agent pilots:

  • Indirect Prompt Injection: Your agent reads an untrusted webpage or email containing hidden instructions. The model can't reliably tell data from commands, so it executes the attacker's will.
  • Excessive Agency & Privilege Escalation: Giving an agent broad tool access paired with a weakly scoped CRM or DB connector. A minor reasoning error turns into an unintended database deletion or unauthorized admin action.
  • Data Leakage & Poisoning: Multi-tenant context bleeding, and RAG systems pulling from poisoned knowledge bases to serve malicious data back to users.

4. Designing for Safe Autonomy

Mitigating this isn't about breakthrough AI research; it's disciplined software engineering:

  • Least Privilege at the Tool Boundary: Treat every single tool call as a permission decision. If the agent doesn't have the capability in the first place, prompt injection can't exploit it.
  • Human-in-the-Loop Gates: Reading is cheap; acting is expensive. Let the agent reason freely, but put irreversible, high-stakes operations (payments, deletions, external publishing) behind a human sign-off step.
  • Observability as a First-Class Feature: Trace every step—the context seen, the decision made, the tool used, and the result. Turn "why did the agent go weird?" into a debuggable event log.

The One-Sentence Version: Agents act—that’s why they’re powerful, why they’re risky, and why you must scope their power and gate the actions you can’t take back.

I wrote a much longer breakdown covering these architectural trade-offs, including the decision matrix on whether to build your own loop vs. use a managed agent runtime (declarative vs. hosted).

Check out the full article here if you're interested

Would love to hear from anyone else shipping agents right now. What failure modes are you hitting that caught you off guard?

reddit.com
u/SnooPuppers2477 — 7 days ago

A race condition on a shared agent instance caused a cross-tenant data leak in our multi-tenant AI system

We were close to shipping an AI agent for an ITSM tool — it turns plain-English requests into structured support tickets. Multi-tenant, one deployment serving many companies. Unit tests green, smoke tests clean, dev stable for days.

During concurrency testing I fired two requests at once — two different tenants hitting the same workflow — and Tenant A's response came back populated with Tenant B's data. Reproducible, every time the two overlapped. I pulled the deploy.

Root cause: we created a single agent instance at startup and reused it for every request. Felt efficient — agents are expensive to spin up, so build once and share. The problem: that one shared agent stored the active tenant's context on itself. Under sequential traffic it's invisible — request finishes, next one overwrites the slot, no harm. Under concurrency it's a time bomb: Request B sets tenant_id while Request A is mid-flight, A reads it back, and A gets B's value. Whoever writes last wins.

What makes agents especially prone to this is that they feel like an object you build once and reuse, and they naturally accumulate state — prompt, retrieved docs, memory, tool results. Every one of those is a slot where per-tenant data can come to rest on something shared. And the failure mode isn't a 500 anyone notices; it's a fluent, confident answer about the wrong company.

Why nothing caught it: every test we owned ran one request at a time. Unit tests are great at proving correctness in isolation and completely blind to two requests stepping on each other. Green tests meant "correct in isolation," not "safe under load" — and for a multi-tenant system those are very different claims.

The fix: the quick patch is per-request instances so there's no shared slot. But that only closes one door. We moved tenancy off the agent entirely and pushed it to the tool boundary — the agent holds no tenant state, every tool call carries its own tenant scope + scoped credentials, and the boundary enforces it per call, so even a hallucinated wrong-tenant request can't cross it. Underneath that: row-level security at the data layer, plus a last-line assertion that every returned record's tenant ID matches the requester. Defense in depth, because any single layer can fail silently.

Concurrency + tenant-isolation tests are now first-class in the pipeline — many tenants hitting the same endpoint simultaneously, asserting zero cross-contamination on every change.

Curious how others handle tenant isolation in stateful/agent systems — do you scope at the tool boundary, the data layer, both? And has anyone found a clean way to make "no per-tenant state on shared objects" enforceable rather than a thing everyone has to remember?

Wrote up the longer version with diagrams here if useful: https://medium.com/@adityadhir97/i-almost-shipped-an-ai-agent-that-could-have-exposed-customer-data-af1c5a750efd

reddit.com
u/SnooPuppers2477 — 10 days ago
▲ 3 r/dev

A race condition on a shared agent instance caused a cross-tenant data leak in our multi-tenant AI system

We were close to shipping an AI agent for an ITSM tool — it turns plain-English requests into structured support tickets. Multi-tenant, one deployment serving many companies. Unit tests green, smoke tests clean, dev stable for days.

During concurrency testing I fired two requests at once — two different tenants hitting the same workflow — and Tenant A's response came back populated with Tenant B's data. Reproducible, every time the two overlapped. I pulled the deploy.

Root cause: we created a single agent instance at startup and reused it for every request. Felt efficient — agents are expensive to spin up, so build once and share. The problem: that one shared agent stored the active tenant's context on itself. Under sequential traffic it's invisible — request finishes, next one overwrites the slot, no harm. Under concurrency it's a time bomb: Request B sets tenant_id while Request A is mid-flight, A reads it back, and A gets B's value. Whoever writes last wins.

What makes agents especially prone to this is that they feel like an object you build once and reuse, and they naturally accumulate state — prompt, retrieved docs, memory, tool results. Every one of those is a slot where per-tenant data can come to rest on something shared. And the failure mode isn't a 500 anyone notices; it's a fluent, confident answer about the wrong company.

Why nothing caught it: every test we owned ran one request at a time. Unit tests are great at proving correctness in isolation and completely blind to two requests stepping on each other. Green tests meant "correct in isolation," not "safe under load" — and for a multi-tenant system those are very different claims.

The fix: the quick patch is per-request instances so there's no shared slot. But that only closes one door. We moved tenancy off the agent entirely and pushed it to the tool boundary — the agent holds no tenant state, every tool call carries its own tenant scope + scoped credentials, and the boundary enforces it per call, so even a hallucinated wrong-tenant request can't cross it. Underneath that: row-level security at the data layer, plus a last-line assertion that every returned record's tenant ID matches the requester. Defense in depth, because any single layer can fail silently.

Concurrency + tenant-isolation tests are now first-class in the pipeline — many tenants hitting the same endpoint simultaneously, asserting zero cross-contamination on every change.

Curious how others handle tenant isolation in stateful/agent systems — do you scope at the tool boundary, the data layer, both? And has anyone found a clean way to make "no per-tenant state on shared objects" enforceable rather than a thing everyone has to remember?

Wrote up the longer version with diagrams here if useful: https://medium.com/@adityadhir97/i-almost-shipped-an-ai-agent-that-could-have-exposed-customer-data-af1c5a750efd

reddit.com
u/SnooPuppers2477 — 12 days ago