How are you actually deciding which agent actions need human approval before executing?
I've been thinking a lot about where approval gates belong in agent architectures, and I keep coming back to the same problem: most teams either gate too much (agent becomes unusable) or gate nothing and hope the model makes good decisions.
In January 2026, an AI agent transferred $27M with no human approval gate at all. Not a jailbreak, not a prompt injection — the agent had the permissions and no gate existed. That's a design decision that went wrong.
The framing I've landed on is two axes: reversibility and impact. High on both means gate before execution. Low on both means let it run. The hard cases are the diagonals — low reversibility but low impact, or high impact but easily reversed.
But this still leaves open questions I don't have clean answers to:
What do you do when the gate gets no response? Default to blocked, or default to proceed? I strongly believe it should fail closed, but I've seen teams argue the opposite for UX reasons.
How do you handle cascading tool calls where one approved action triggers a second action that should also require approval? Does the first approval carry over?
And at what dollar threshold does a financial action need a gate? $1K? $10K? Depends entirely on the use case but I haven't seen anyone publish a principled framework for this.
Curious how others are drawing these lines in production. What criteria are you actually using?