Agent Observability and what I think
Hey all, I wanted to share a perspective on something I've been thinking about a lot lately.
Traditional APM was built for request-response and AI Agents break that model entirely. Because, most of what's on the market right now is just legacy APM with agent added, and that leaves a gap you really only feel when things go wrong. You can see the agent's intent (what it decided to do) OR the system-level impact (latency, errors, resource usage), but not both in the same trace. Unfortunately, you're flying blind through the exact moments when cost spikes.
I think observability at the agent layer is one of the real problems here. It's not solved yet. But it's defined well enough that you can instrument properly if you start now.
UC Santa Cruz published research on this last year (arxiv:2508.02736). They used eBPF to intercept TLS traffic and correlate what the agent intended to do with what actually happened at the kernel level. Less than 3% overhead. Point being that this is architecturally possible.
About 5% of AI model requests fail in production today (Datadog, April 2026 survey). Sixty percent of those failures are capacity-related, not model errors. So, it's an operational gap. And teams that built agent-layer observability into their setup caught those failures before they cascaded into outages. Teams that didn't had incidents.
If you're building agents, start with OpenTelemetry. If you're buying a platform, ask the hard questions: Does this handle reasoning loops as a first-class thing? Can you see the decision tree as a continuous trace? Does it know the difference between a tool failing and the agent misunderstanding the tool? Can you alert on semantic drift?
Those are the questions that separate something actually built for agents from something that's just adding agent features to traditional APM. Honeycomb published their approach. Langfuse and LangSmith are solid for multi-step debugging. There are about 15 tools competing on this now, most built on OpenTelemetry standards.
My candid assessment is that you're going to be in supervised mode for a while. Your agent still needs human approval, there is no way around it right now. That's not going away in the next two years. If a vendor tells you otherwise, that's a red flag.
Curious if people can share a) what does good agent observability actually look like at your scale? And b) what are you currently missing on the observability side if anything?