u/sudeep_dk

Is anyone else ready to completely jump ship on Z.ai after these latest rate-limit changes?

Up until last week, I was easily burning through 20 to 21 million tokens over a few days working on large codebase refactors and agentic loops using GLM. I never had an issue because we only had to worry about the weekly limit pool, which let you distribute heavy workloads over a sprint.

Then the updates rolled out, and they stealth-enforced a hard 5-hour rolling session quota.

Today, I hit 100% of my 5-hour limit at just 5–6 million tokens used. Because GLM-5 and 5.1 pull massive context windows (up to 200k tokens) and use deep reasoning paths, multi-turn coding agents like Claude Code or OpenClaw instantly chew through this new session cap in just a few iterations.

What is the point of selling a coding plan built for "long-horizon task execution" if a standard development session locks you out for hours at a time? It completely ruins the rhythm of engineering work.

For those using Z.ai for serious, high-volume production coding:

Are you planning to stay and deal with the 5-hour lockouts, or is it time to look elsewhere?
Are you migrating to heavy API-based scaling on other providers, or just shifting workloads to local clusters (e.g., hosting your own deep-thinking setups)?

I can't have my entire IDE and terminal setup freezing mid-problem because of a hidden rolling clock. Let me know how you guys are handling this.

https://preview.redd.it/pqty40w9i92h1.png?width=2712&format=png&auto=webp&s=1afbc03367f0275500eb66f5f52d61ed8ff5edd2

The new 5-hour quota completely killed Z.ai for dev workflows. 5M tokens used and I'm locked out.