u/Creamy-And-Crowded

I keep seeing agent observability framed as the answer to production risk: trace the prompts, the outputs, the tool calls, store everything, replay the run...

That's useful, but it also feels very incomplete. If an agent refunds someone, sends an email, updates a ticket, changes a subscription, or touches internal data, the interesting question is not only what did it do, but especially why was it allowed to do that.

A trace can show that the agent called a tool, but it does not necessarily show that the agent had enough evidence coming from a trusted place, that the action matched the user’s intent, or that the policy check actually meant anything.

So in a lot of systems we are building amazing high resolution, searchable, timestamped crash footage.

The missing layer in my opinion is runtime justification. Maybe this is only a problem once agents touch money, customer data, legal workflows, support operations, or external communications, but isn't that exactly where everyone wants to deploy them?

Most agent observability feels like crash footage