
Notes on monitoring Claude Code in production: OTel temporality, cost-as-estimate, cost attribution by team
Hi! Just a little post about getting visibility into Claude Code usage when an org rolls it out across teams (or for your unhinged self, obviously). So Claude Code emits OpenTelemetry metrics now (yay!) That landed quietly in the docs and the community who are busy vibe coding I guess but it ends up being very the practical if you have a real otel-based o11y stack. I've been running this on a Prometheus backend for a bit and wanted to share the gotchas worth knowing up front, because none of them are obvious from the spec.
For the dashboard side: I built one in PromQL that implements all of this, as the OSS-stack parallel to the existing Azure Application Insights dashboard (25052 by 1w2w3y on Grafana Labs). MIT licensed. Write-up of the implementation with more screenshots is on my blog.
Article: https://rockdarko.dev/posts/grafana-dashboard-for-claude-code-on-prometheus/
Dashboard: https://grafana.com/grafana/dashboards/25255-claude-code-metrics-prometheus/
Repo: https://github.com/rockdarko/claude-code-metrics-prometheus
Things that bit me or that I had to verify against the source if you care:
Pin OTel temporality to cumulative. The SDK's default is cumulative now, but defaults have drifted across versions. If you end up with delta on the producer and a Prometheus-flavored consumer expecting cumulative, your rate() queries return wrong numbers silently. Set OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=cumulative explicitly on the Claude Code side and stop wondering.
The cost number is a client-side estimate. Claude Code computes it from token counts and per-model pricing at request time. It's useful for trend visibility and per-team attribution, but it won't match Anthropic's invoice to the cent. Cache discounts, prompt caching, and pricing changes mid-window are the main drift sources.
PR / commit counters only increment when Claude Code itself opens the PR or commit (e.g., via gh CLI inside a session). PRs the developer opens manually afterward don't register. Worth knowing before you wire alerts around "team X opened zero PRs this week."
OTEL_RESOURCE_ATTRIBUTES is the lever for org-wide visibility. Setting attributes like team=, cost_center=, project= at SDK startup propagates as Prometheus labels and gives you per-team / per-project rollups out of the same metrics. The per-user data is exposed (user.id is included by default); what you do with that is your org's call.
Cache hit ratio is the single biggest lever on monthly cost. It is the difference between a sustainable bill and an alarming one. Worth a dedicated panel.
Hope this helps a few in here! 😄 Cheers