Failures in financial AI agents
For teams deploying LLM/agentic systems into financial workflows, how real is the operational recovery/problem-management side once these systems start taking actions instead of just generating text?
I’m especially curious about cases where the workflow technically “succeeds” at first, but becomes wrong later because of reconciliation mismatches, stale context, invalid state transitions, settlement issues, etc.
Are teams actually defining explicit correctness boundaries/checkpoints/reversibility ahead of deployment, or is most recovery still manual investigation after something breaks?
Trying to understand how mature this is in practice.