
The new 5-hour quota completely killed Z.ai for dev workflows. 5M tokens used and I'm locked out.
Is anyone else ready to completely jump ship on Z.ai after these latest rate-limit changes?
Up until last week, I was easily burning through 20 to 21 million tokens over a few days working on large codebase refactors and agentic loops using GLM. I never had an issue because we only had to worry about the weekly limit pool, which let you distribute heavy workloads over a sprint.
Then the updates rolled out, and they stealth-enforced a hard 5-hour rolling session quota.
Today, I hit 100% of my 5-hour limit at just 5–6 million tokens used. Because GLM-5 and 5.1 pull massive context windows (up to 200k tokens) and use deep reasoning paths, multi-turn coding agents like Claude Code or OpenClaw instantly chew through this new session cap in just a few iterations.
What is the point of selling a coding plan built for "long-horizon task execution" if a standard development session locks you out for hours at a time? It completely ruins the rhythm of engineering work.
For those using Z.ai for serious, high-volume production coding:
- Are you planning to stay and deal with the 5-hour lockouts, or is it time to look elsewhere?
- Are you migrating to heavy API-based scaling on other providers, or just shifting workloads to local clusters (e.g., hosting your own deep-thinking setups)?
I can't have my entire IDE and terminal setup freezing mid-problem because of a hidden rolling clock. Let me know how you guys are handling this.