Claude Code Cost Analysis: Cache ReWarming Write Costs from Session Inactivity
I'm sure this is fairly widespread knowledge, but for the few of us that didn't know I thought I'd have Claude share a little bit of our deep dive into costs on some projects I've been working on. Long story short, 5 min TTL on caching means that if you often tab away and get distracted or take breaks from your current project (like I do 5-10 times per day), your costs are going to add up significantly from cache writes to rewarm up your big bloated cache (okay my caches are big and bloated, I'm sure yours aren't). I didn't really think about it too hard until I noticed my output tokens should not be costing what I was spending.
----- From Claude
Summary
In Claude Code, cache reads and writes — not output tokens — dominate API spend. The prompt cache has a 5-minute TTL. Each period of inactivity exceeding this TTL triggers a full-context cache write at 1.25× the base input rate. For sessions with frequent idle gaps, cache writes can approach or exceed cache read costs, roughly doubling the caching bill relative to a continuously-active session.
Observed Data
41-day Sonnet 4.6 session (damn! did I really use the same session for 41 days?), context cleared periodically via /clear, multiple daily idle gaps:
| Component | Tokens | $/MTok | Cost |
|---|---|---|---|
| Input | 19.1K | $3.00 | $0.06 |
| Output | 1.1M | $15.00 | $16.50 |
| Cache read | 353.2M | $0.30 | $105.96 |
| Cache write | 27.7M | $3.75 | $103.88 |
| Total | $227.02 |
Output tokens account for ~7% of total cost. Cache operations account for ~93%.
Without caching, the ~380M tokens of repeated context would cost ~$1,140 at standard input rates. Caching reduced this to ~$210 — but the write component ($104) is nearly equal to the read component ($106), indicating frequent cache invalidation.
Mechanism
Each API call in Claude Code transmits the full prefix: system prompt, tool definitions, project configuration, and conversation history. When the cache is warm, this prefix is read at $0.30/MTok. After a >5-minute gap, the prefix must be rewritten at $3.75/MTok — 12.5× the read rate.
With an estimated 200-400 cold starts over 41 days and average context size of ~100K tokens at time of invalidation: ~300 × 100K × $3.75/MTok ≈ $112.50, consistent with the observed $104.
Mitigation
/compactbefore idle periods. Compaction summarizes conversation history, reducing context size. A 150K→20K compaction reduces the next cold-start write from ~$0.56 to ~$0.075./compactover/clearfor related work./clearguarantees a cold start with no context preservation./compactretains relevant state in fewer tokens.- Minimize file reads into context. Use targeted tools (
grep,head, symbol search) rather than reading entire files. Each file read persists in context and inflates every subsequent cache operation. - Compact proactively at ~60% context capacity rather than waiting for auto-compaction near the limit.
The single highest-leverage habit: type /compact before stepping away from the terminal.