u/Comfortable_Way8312

Are you actually running AI agents in production? What’s failing the most?

I'm doing research into production AI agent systems and trying to separate real-world problems from demo-level success.

A lot of agent demos look impressive until they hit:

  • long-running workflows
  • inconsistent tool outputs
  • permission boundaries
  • retries/recovery
  • memory drift
  • context loss
  • hidden hallucinations
  • orchestration complexity

What surprised me is that the actual “reasoning” often isn’t the biggest problem.

The bigger issues seem to be:

  • reliability
  • state management
  • workflow continuity
  • evaluation/testing
  • governance
  • infrastructure costs

For people actually running agents in production (or even serious internal tooling):

  • what stack are you using?
  • what works better than expected?
  • what constantly breaks?
  • what problem became bigger than you originally thought?

Especially curious about:

  • memory systems
  • multi-agent coordination
  • long-term context
  • human approval flows
  • observability/debugging

Would love to hear real experiences rather than hype.
Even failed experiments are useful.

reddit.com
u/Comfortable_Way8312 — 10 days ago