u/Creamy-And-Crowded

▲ 2 r/AIAssisted+1 crossposts

Most agent observability feels like crash footage

I keep seeing agent observability framed as the answer to production risk: trace the prompts, the outputs, the tool calls, store everything, replay the run...

That's useful, but it also feels very incomplete. If an agent refunds someone, sends an email, updates a ticket, changes a subscription, or touches internal data, the interesting question is not only what did it do, but especially why was it allowed to do that.

A trace can show that the agent called a tool, but it does not necessarily show that the agent had enough evidence coming from a trusted place, that the action matched the user’s intent, or that the policy check actually meant anything.

So in a lot of systems we are building amazing high resolution, searchable, timestamped crash footage.

The missing layer in my opinion is runtime justification. Maybe this is only a problem once agents touch money, customer data, legal workflows, support operations, or external communications, but isn't that exactly where everyone wants to deploy them?

reddit.com
u/Creamy-And-Crowded — 8 days ago