after almost getting a surprise bill i started logging every interaction by model and task type. ran this for 14 days on my telegram + discord agent
heartbeats (every 30 mins, 672 total)... 38% of my token usage. was running on opus. genuinely insane waste for a status ping
file reads and summaries... 29% of usage. also on opus. flash handles this identically
actual conversations where model quality mattered... 22% of usage
complex tasks where opus was genuinely better than flash... 11% of usage
so 67% of my spend was on tasks where the cheapest model (v4 flash at $0.14/M) would have been identical quality to opus ($6.75/M effective after tokenizer)
the fix... switch your primary model to deepseek/deepseek-v4-flash in your openclaw.json under agents.defaults.model.primary. then use /model anthropic/claude-opus-4-7 mid-session only when you actually need it for somthing hard. switches instantly, no restart, same session. type /model deepseek/deepseek-v4-flash when youre done with the hard part and go back to cheap
went from ~$170/month to about $35 with this approach. the quality difference on heartbeats, file reads, and simple questions is genuinley zero
honestly the most frustrating part was spending 2 weeks manually logging everything just to find this out. i run my gmail agent on betterclaw free tierwith BYOK and they recently added an update that shows exactly how your api key is spending per task which is genuinley a great update... caught my heartbeat waste there instantly instead of 2 weeks of manual tracking. but yeah switching your primary to flash and /model-ing up to opus only when needed is the move