How are you actually deciding which agent actions need human approval before executing?

I've been thinking a lot about where approval gates belong in agent architectures, and I keep coming back to the same problem: most teams either gate too much (agent becomes unusable) or gate nothing and hope the model makes good decisions.

In January 2026, an AI agent transferred $27M with no human approval gate at all. Not a jailbreak, not a prompt injection — the agent had the permissions and no gate existed. That's a design decision that went wrong.

The framing I've landed on is two axes: reversibility and impact. High on both means gate before execution. Low on both means let it run. The hard cases are the diagonals — low reversibility but low impact, or high impact but easily reversed.

But this still leaves open questions I don't have clean answers to:

What do you do when the gate gets no response? Default to blocked, or default to proceed? I strongly believe it should fail closed, but I've seen teams argue the opposite for UX reasons.

How do you handle cascading tool calls where one approved action triggers a second action that should also require approval? Does the first approval carry over?

And at what dollar threshold does a financial action need a gate? $1K? $10K? Depends entirely on the use case but I haven't seen anyone publish a principled framework for this.

Curious how others are drawing these lines in production. What criteria are you actually using?

reddit.com
u/Cybertron__ — 1 day ago

Are you storing AI agent action logs in the same DB as your application? Because that's not an audit log.

Been building agent infrastructure for a while now and I keep seeing the same pattern: teams point to their MongoDB collection or Postgres table and call it their "audit log."

The problem is that if your agent has write access to your application database — which most do, because that's where they do useful work — it has write access to its own event history. A misbehaving agent, a compromised session, or even just a botched migration can quietly alter or remove entries with no visible trace that anything changed.

A real audit log needs one specific property: you cannot modify or delete an entry without the tampering being mathematically detectable. SHA-256 hash-chaining does this — each entry includes the hash of the previous one, so breaking the chain anywhere is immediately visible on validation.

This matters for forensics. When the GitGuardian 2025 report found that 64% of API keys leaked in 2022 were still valid in early 2026, that's partially a detection problem. You need to be able to reconstruct exactly what an agent did, in sequence, with confidence that the record wasn't altered after the fact.

Separate write path. Append-only storage. Hash-chained entries. Exportable.

That's the baseline. Curious whether anyone here has actually implemented this properly in production — and if so, what stack you used for the log storage layer specifically.

reddit.com
u/Cybertron__ — 8 days ago

60% of teams can't terminate a misbehaving agent mid-run. How are you handling kill switches?

Came across this stat in a recent security report and it's been sitting with me: 60% of organizations running AI agents in production cannot terminate a misbehaving agent once it's running. Not "it's difficult." Not "it takes a few minutes." Cannot.

We've built kill switches into every other automated system that touches production — factory robots, payment processors, batch jobs. The moment something behaves unexpectedly you have a way to stop it.

With agents most teams are relying on the process dying naturally or killing the whole service. Neither is acceptable when the agent is mid-transfer, mid-deletion, or mid-email-to-your-entire-customer-list.

Curious what people are actually doing here:

- Are you building explicit abort mechanisms into your agent loops?

- Are you using timeout limits as a proxy for kill switches?

- Or is this a gap you're just living with?

Not looking for theoretical answers — what are you actually running..

reddit.com
u/Cybertron__ — 11 days ago

Johns Hopkins researchers stole API keys from Claude Code, Gemini, and Copilot using only PR titles — and none of the vendors published advisories

In April 2026, researchers at Johns Hopkins demonstrated prompt injection

attacks against three production AI coding agents:

- Claude Code Security Review

- Google Gemini CLI Action

- GitHub Copilot Agent

Attack vector: a malicious payload embedded in a PR title or issue body.

No external infrastructure required. GitHub itself served as the C2.

All three vendors paid bug bounties. None published public advisories.

What this reveals architecturally:

The agents couldn't distinguish between legitimate task context and

injected instructions — because they're trained to treat all text in

context as potentially actionable. That "helpfulness" is the attack

surface.

The fix isn't model-level. You can't patch "reads text and follows

instructions" — that's the core capability. The control has to sit at

the action layer: what the agent is permitted to do with what it reads,

regardless of what it's instructed to do.

This is the same access control principle we apply to every other

automated system. We just forgot to apply it to agents.

Source: [Johns Hopkins / GitHub Actions research, April 2026]

reddit.com
u/Cybertron__ — 13 days ago

Hybrid architecture.

"why we run deterministic rules before the
LLM on every single tool call
been building agent infrastructure for a while and the question i get most is why not just let the LLM decide what's safe and what isn't. here's the actual

reasoning:
speed. a regex policy check runs in under 50ms. an LLM inference call adds 200-800ms minimum. for a governance layer that fires on every tool call that latency compounds fast.

predictability. LLMs are probabilistic. a hard rule that says 'block any tool call containing DROP TABLE' fires 100% of the time. an LLM asked the same question gets it right 97% of the time. the 3% is your production incident.

cost. if your agent makes 1000 tool calls a day and you're running an LLM check on every one of them you've just added significant inference cost to your Vernance layer on top of your agent cost.

so the architecture we landed on is layered. deterministic rules catch the obvious stuff first, fast and cheap. PII patterns, dangerous operations, policy violations that are black and white. the LLM gate only fires on the ambiguous cases the rules can't resolve confidently. you get 90% of the protection from the logic layer. the LLM handles the 10% that needs judgment.

the part most people miss: this also makes your audit trail cleaner. when a rule blocks something you know exactly why. when an LLM blocks something you have to ask it why and trust the explanation. for compliance purposes deterministic decisions are far easier to defend.
anyone else running hybrid approaches or going full LLM on governance?"

reddit.com
u/Cybertron__ — 14 days ago

My LangGraph agent deleted production records last month. Here's what I learned about governing tool calls.

Running a LangGraph agent in production that had access to our database for a legitimate reason. One bad prompt, one edge case the policy didn't cover, and it ran a delete operation it absolutely shouldn't have.

We had logs. We did not have an audit trail that could tell us why the agent decided to do it, what it saw before it made that call, or whether any policy was even evaluated.

Spent the next week building an interceptor layer that sits before every tool call executes — policy check, PII scan, log the full input payload, escalate to a human if it looks risky. Not complicated in concept, genuinely painful to build properly.

Things I didn't anticipate needing:

  • Prompt injection detection in tool inputs (agents reading external data can get injected)
  • Tamper-evident logging (regular DB logs aren't compliance evidence)
  • Budget enforcement — agents in loops can burn through API budgets fast

For anyone running agents in production — what's your current approach to this? Are you handling it at the framework level or building something separate?

(Built Polaxis.io to solve this for ourselves, happy to share what the interceptor pattern looks like if useful)

reddit.com
u/Cybertron__ — 16 days ago
▲ 3 r/SaasDevelopers+1 crossposts

I have build something useful for startup founders and business that uses ai agents for task.

So it started when I was graduating from my masters degree, there in my final project I had a team member who was working for an enterprise as ai governance, so we got into the deep talk about the industry. After that I got home and started my research on that like what’s already out there what’s not. The gap and missing point. So here is what I have build a platform which governs every ai agent deployed in the business.

Features-
Budget control
Policy engine
Firewall specifically designed for ai agents
Compliance report. ( every thing what agent has done so far all)
Human in the loop

I want you guys to check it out feedback and critique are all welcome. Polaxis.io

reddit.com
u/Cybertron__ — 16 days ago
▲ 1 r/SaaS

Broken post-launch?

What are the steps founders usually skips before launch that broke platform post-launch?

reddit.com
u/Cybertron__ — 20 days ago