Swapped out Sonnet for GLM 5.1 and K2.6 in Claude Code for a week
The recent subsidy posts here got under my skin. Yeah the 5-hour limits went back up earlier this month but that didn't really answer the question, just made it less urgent. So last week I kept Claude Code but pointed ANTHROPIC_BASE_URL at a different provider and used GLM 5.1 plus K2.6 for the week. Both came out in April so I figured the early integration bugs would mostly be worked out.
It's a Go service I've been working on for a while. Normal week of refactors plus some test scaffolding and a couple new endpoints. Same stuff I'd usually have Sonnet do. Set GLM 5.1 as the default in the env vars, used K2.6 when I needed wider context across files. Went with one of the Anthropic-compatible aggregator routes rather than wiring two providers separately, because I didn't want to rewrite my session scripts.
GLM 5.1 surprised me. I'd written off the benchmark hype as PR but for the kind of day-to-day refactor work I do, the gap to Sonnet wasn't really noticeable after a couple days. It's more verbose than Sonnet. Double checks itself a lot more than I'd like. I can't really speak to the frontend agent stuff people are excited about because I don't do enough of it.
K2.6 was solid for the wide-context tasks. Fed it about 80k tokens for a migration across a few packages and references tracked correctly. The weak spot is the same one I hit with every open model, custom tools with three or four nested args. Sonnet handles those fine, K2.6 needs a retry maybe a quarter of the time.
Sonnet's hallucinations are sneaky. It'll invent a function signature that looks like something the library would have. GLM's are louder, syntax compiles fine but the module it references isn't in your imports. Bad in different ways but I'd rather have the loud kind in review.
One thing that tripped me up early. The model env var names in Claude Code are tied to Sonnet and Opus, so when I set ANTHROPIC_DEFAULT_SONNET_MODEL to GLM, I forgot Opus was still pointing at the Anthropic default and was silently falling back. Burned a chunk of the first morning before I noticed. Make sure you set every model env var, not just the obvious one.
On cost. Can't give a clean comparison because subscription vs subscription is messy. But the same week of work that usually has me watching my Claude Code session burn down by Friday afternoon felt fine on the new setup. Not the meme-y "I saved 75%" story, but not a small difference either.
Latency is the one thing that hasn't really faded. Sonnet you don't notice, you just work. GLM is close. K2.6 has this little pause before each tool call, which fades in batch work but stands out when you're typing back and forth. Don't see that in any benchmark.
Anyway. Subsidy threads were what got me to actually try it instead of speculating.