Instrumenting AI agents to catch loops and credit burn
Woke up to a $500 API bill last week because a "thinking" agent got stuck in a loop overnight. Health checks were green the whole time. Valid JSON, HTTP 200s, no exceptions thrown. The thing had just re-read the same docs page about 2,000 times and called it progress.
The frustrating part isn't the money. It's that every layer of our stack thought the agent was doing fine.
I've started calling this a Semantic Success Trap: the agent passes every syntactic check — valid output, clean status codes, no thrown errors — but the actual task never moves forward. It's stuck in a recursive thought pattern, or worse, it's treating an error message as a successful confirmation and retrying with the exact same failed parameters. Traditional logging can't see it, because from infra's perspective the model is just being "thorough."
A few patterns that have actually helped us catch this earlier:
- Semantic similarity on consecutive steps. Embed the last 3 thoughts or tool outputs. If cosine similarity stays above ~0.95 for 3+ turns, back off or kick to human review. Cheap, and catches most "polite loops."
- Per-trace token budgets, not just global caps. Global API limits only tell you after the damage is done. A hard ceiling per single task execution kills the runaway before it compounds.
- State drift = 0 is a red flag. Log the JSON state at every tool call. If the delta between T and T+1 is zero for 3 turns, the agent is spinning, not thinking.
- "Steps-per-task" as an MLOps metric. When the average step count for a known-simple task jumps 20%, a model update probably introduced a new logic loop. We've caught two regressions this way before users noticed.
What actually moved the needle for us was getting off raw logs and onto trace-level views — being able to see the branching of the decision tree in real time made the loop point obvious in seconds instead of hours. We've been using Evose for that part, mostly because flat logs were lying to us about what the agent was actually doing.
Honestly the bigger shift in my head was realizing "no error" ≠ "task done." Most of our monitoring was built for the first definition. Agents need the second.
How are you handling hallucinated successes — cases where the agent confidently says it finished, but really just gave up inside a loop? Are you catching it at the trace layer, the eval layer, or just from user complaints after the fact?