u/Comfortable_Way8312

▲ 3 r/AI_Agents

Are you actually running AI agents in production? What’s failing the most?

I'm doing research into production AI agent systems and trying to separate real-world problems from demo-level success.

A lot of agent demos look impressive until they hit:

long-running workflows
inconsistent tool outputs
permission boundaries
retries/recovery
memory drift
context loss
hidden hallucinations
orchestration complexity

What surprised me is that the actual “reasoning” often isn’t the biggest problem.

The bigger issues seem to be:

reliability
state management
workflow continuity
evaluation/testing
governance
infrastructure costs

For people actually running agents in production (or even serious internal tooling):

what stack are you using?
what works better than expected?
what constantly breaks?
what problem became bigger than you originally thought?

Especially curious about:

memory systems
multi-agent coordination
long-term context
human approval flows
observability/debugging

Would love to hear real experiences rather than hype.
Even failed experiments are useful.

u/Comfortable_Way8312 — 10 days ago