r/ZaiGLM

▲ 5 r/ZaiGLM

Alternatives to GLM?

High folks, I got a good deal as in I subscribed to PRO for $30/month before they doubled the prices. However, that is far from enough as I hit the 5 hrs limit very fast. With current prices, I won't subscribe to Max, it's insane to pay $160 / month.

I know they have cheap subscriptions in China, however I can't subscribe in China (no KYC, no phone number, no way to pay using wechat etc.).

So here comes the question: does anyone know a token plan with the same quality but with a better pricing?

reddit.com
u/PitchSuch — 10 hours ago
▲ 4 r/ZaiGLM

I didn't know this quota system. How is this 3x to Claude Pro ?

https://preview.redd.it/f4zd8pk7vg2h1.png?width=546&format=png&auto=webp&s=6b4a2768c165f10d9f38d91507c442dcb9616d43

I didn't know about these limits of my lite plan. I thought its me wasting my tokens so quickly. Then today I read this. This help tip says GLM 5.1 and 5 turbo eating quota 3x during peak hours.
When did they updated this ? Or its just me who missed this tip ?
I think its kind of cheating, because its nowhere to 3x of Claude Pro now in lite plan.
The usage limits are very similar to 20 dollars Codex Plus plan. And their models are better than GLM series +provides native image understanding, ChatGPT features, image generation etc. tons of other feature on web which are better value than this lite plan for just @ dollars extra.

Its scam for early adapters. 😞

reddit.com
u/Mayanktaker — 13 hours ago
▲ 51 r/ZaiGLM+3 crossposts

recently a lot of LLM providers are starting to limit the coding plan with limited quota, and i wanted to know how much would i need to spend on API keys if i would use the same usage on daily/monthly basis.

i decided to make this extension to visualize all my token usage based on actual usage from kilocode, so i can model the estimated token usage cost correctly if I were to use the API keys.

i also added a statusbar to show the z.ai LLM provider quota usage before it resets

u/timx88 — 21 hours ago
▲ 11 r/ZaiGLM

The new 5-hour quota completely killed Z.ai for dev workflows. 5M tokens used and I'm locked out.

Is anyone else ready to completely jump ship on Z.ai after these latest rate-limit changes?

Up until last week, I was easily burning through 20 to 21 million tokens over a few days working on large codebase refactors and agentic loops using GLM. I never had an issue because we only had to worry about the weekly limit pool, which let you distribute heavy workloads over a sprint.

Then the updates rolled out, and they stealth-enforced a hard 5-hour rolling session quota.

Today, I hit 100% of my 5-hour limit at just 5–6 million tokens used. Because GLM-5 and 5.1 pull massive context windows (up to 200k tokens) and use deep reasoning paths, multi-turn coding agents like Claude Code or OpenClaw instantly chew through this new session cap in just a few iterations.

What is the point of selling a coding plan built for "long-horizon task execution" if a standard development session locks you out for hours at a time? It completely ruins the rhythm of engineering work.

For those using Z.ai for serious, high-volume production coding:

  1. Are you planning to stay and deal with the 5-hour lockouts, or is it time to look elsewhere?
  2. Are you migrating to heavy API-based scaling on other providers, or just shifting workloads to local clusters (e.g., hosting your own deep-thinking setups)?

I can't have my entire IDE and terminal setup freezing mid-problem because of a hidden rolling clock. Let me know how you guys are handling this.

https://preview.redd.it/pqty40w9i92h1.png?width=2712&format=png&auto=webp&s=1afbc03367f0275500eb66f5f52d61ed8ff5edd2

reddit.com
u/sudeep_dk — 1 day ago
▲ 0 r/ZaiGLM

New Low‑Cost Method to Access GLM‑5.1 from China

Purchase permanent usage credits via top‑up:

  • $1 for 1,000 requests
  • $3 for 4,000 requests
  • $9 for 12,000 requests

Credits have no expiration date.

Double‑rate deduction applies to glm‑5.1 and kimi‑k2.6: equivalent to $1 for 500 requests, $3 for 2,000 requests, $9 for 6,000 requests.

Standard single‑rate deduction applies to glm‑5, kimi‑k2.5, and minimax‑m2.5.

No restrictions such as 5‑hour, weekly, or monthly usage limits.

Supports payment via credit cards such as Visa.Taxes included.

website: relay-ai.cc

reddit.com
▲ 6 r/ZaiGLM+2 crossposts

Route Xcode Agent to GLM, DeepSeek & more with ProxyPilot

It's been a few months since the original Xcode 26.3 release that introduced agentic coding inside Xcode, but the original issues (especially massive context bloat and opaque tool calling) persist into v26.5. And we are still officially locked into using Claude or Codex within Xcode's agentic harness.

ProxyPilot is a free and open-source utility for macOS that gives devs the ability to change what upstream model the Xcode Agent talks to, preserving tool-calling and chained requests across dozens of providers. It sits between Xcode and the upstream API. Xcode thinks it's talking to a supported provider, but your requests actually go to whichever model you pick.

It also supports local inference via LM Studio and Ollama.

The latest version enables (optional) full prompt logging, session history, and more support for additional providers. Since Xcode does not show the actual inputs/outputs of your agent chats, ProxyPilot can show the full content of what the LLM providers actually see, allowing you more control over how to design effective prompts and save on quota and tokens.

ProxyPilot v1.8.9 for macOS Tahoe

reddit.com
u/myeleventhreddit — 2 days ago
▲ 30 r/ZaiGLM

We built an open-source context engine for coding agents that got GLM to solve SWE Bench tests Opus could not solve, here's how:

So, after several weeks of frustration with claude code and token spend, we came up with a thesis: with the right context, an open-weight model could match a frontier model on coding. So we decided to build Bitloops to test it.

Bitloops is an open-source memory and context layer for coding agents. We benchmarked it: GLM 5.1 on Opencode paired with Bitloops scored 88 on SWE-bench Verified (for the 43 Rust specific tests). This is higher than Claude Opus 4.6's 81% on the same benchmark.

How it works:

  • Targeted context retrieval, not grep. Bitloops continuously models your codebase: structural relationships, dependencies, prior decisions. When the agent asks "how does auth work," it gets back the connected code and reasoning, not 12 random snippets. Agents query through DevQL, a typed GraphQL interface they already understand.
  • Shared memory across sessions. Most agents start every session from zero. Bitloops keeps a local knowledge layer scoped to the repo and shared across agents. Cursor in the morning, Claude Code in the afternoon, same memory.
  • Git-linked reasoning capture. Every session becomes a Checkpoint tied to your commits. Next session, the model sees why the last change was made, not just what changed. Reviewers get the developer-agent conversation next to the diff.
  • Native agent hooks. Bitloops plugs into the agent's own hook surface on Claude Code, Codex, Cursor, Gemini, Copilot, and OpenCode. Context gets injected before the model sees the prompt. No protocol indirection.
  • Local-first. Rust daemon, SQLite + DuckDB, local embeddings runtime.
  • Local dashboard: still alpha, but it can present the analysis of your codebase in different ways like code-city, architectural structure, etc.
  • Languages: works with TS / JS, Python, Rust, Go, Java, C# and PHP

Apache 2.0, everything's on GitHub: https://github.com/bitloops/bitloops

Happy to dig into the architecture, the hook integration, or the benchmark methodology.

u/mastagio — 3 days ago
▲ 11 r/ZaiGLM

Is the Z.Ai (GLM 5.1) $18 plan a good use case?

Is there a clear limit on tokens or requests in this plan? I was subscribed to the $30 plan, but my automatic renewal was turned off, and because of that, I was surprised by the price increase...

( 72 dollars ).

reddit.com
u/Intelligent-Taste-36 — 4 days ago
▲ 0 r/ZaiGLM

GLM API CODING PLAN request timed out

Hey everyone,
I’m having a lot of issues with request timeouts lately and I’m trying to figure out if it’s just me or something broader with GLM coding plan.

I tested it in multiple environments:

  • My dev-machine monolith running with Docker
  • Native Windows setup (I switched to rule out Docker/networking issues)
  • OpenCode
  • PI agent (this one failed way more often)

The behavior is pretty similar across setups: random request timeouts, stalled responses, and sometimes requests just hanging indefinitely.

At this point I’m not convinced it’s an environment issue anymore, since I reproduced it basically everywhere. Curious if anyone else has been experiencing the same thing recently, or if there’s some known workaround/configuration tweak I’m missing before I sacrifice another evening to the distributed systems gods. Humanity really looked at networking and decided “packet loss but with feelings” was an acceptable foundation for civilization.

reddit.com
u/Clean-End2770 — 3 days ago
▲ 18 r/ZaiGLM

GLM Limits got be flabbergasted

Literally confused right now, first time tried GLM5.1 with $72 pro plan today, I have codex 20x so all coding done with GPT5.5 High. Used GLM5.1 on hermes with project overview and research, planning. One day Got rate limited. Weekly 27% use already. Finally was happy found something reliable over lazy opus4.6/4.7 but rate limit is insane. Another turn off was no direct image comprehension like MCP had to be used. Other than that GLM5.1 is impressive so far. I will pick GLM5.1 any day over Claude like fr. Seems this scam rate limit is widespread issue.

reddit.com
u/Azamat0212 — 5 days ago
▲ 3 r/ZaiGLM

I found the same AI service in China… but with DOUBLE the upload limit

So I stumbled on something interesting while exploring Z.ai.

I noticed that the Z.ai logo actually points to chatglm.cn. Out of curiosity, I followed the link… and ended up on what looks like the same service, but in a Chinese version.

A few differences stood out immediately:

  • It supports login via WeChat
  • The interface is clearly tailored for the Chinese market

But here’s the part that really caught my attention:

  • On chatglm.cn, you can upload files up to 100 MB
  • On Z.ai, the limit is only 50 MB

Same product (or at least very similar), but different limits depending on the region.

Now I’m wondering:
Is this a technical limitation?
A business decision?
Or just different deployment configurations?

Curious if anyone else has noticed this kind of behavior with other AI tools !

reddit.com
u/s2dar — 4 days ago
▲ 3 r/ZaiGLM

Less Hallucination but higher rabbit hole digger

I think I found the rate limit issue. GLM5.1 is seriously drifts to rabbit hole digging like menace which burns the limit like maniac. This is purely GLM5.1 model issue not user issue.

it's an autonomous model, designed to execute non stop but taxing for users who is paying for usage. One would say this is equivalent to directly burning money on the porch.

If you have project with long dependencies and deep level architecture it will dig the shit out of it and snitch back to usage tracking and hit you back with rate limit. So even buying 20x plan won't help you.

reddit.com
u/Azamat0212 — 5 days ago
▲ 7 r/ZaiGLM

Enable_TOOL_SEARCH: "true" is working in claude code again (in case you missed it)

Just like the title suggests, the Enable_TOOL_SEARCH: "true" in settings.json file actually preventing claude code from loading all mcps upfront and saves a lot of context and only load the mcp when needed, this was not working about a month or so, but i tried now and it worked again and saved me about 20% of context window at startup (which also reduces usages along the session, and 20%s will added up)

hope this helps someone

reddit.com
u/muhamedyousof — 4 days ago
▲ 11 r/ZaiGLM

Does GLM 5.1 Train on Your Data?

I'm working on something private. Wondering if i give it some docx files will it train on the data or keep it just within our context?

reddit.com
u/Elibroftw — 5 days ago
▲ 0 r/ZaiGLM

Does Z.ai refund to ***first time buyers? Who never used GLM5.1?

I know they are stating publicly no refunds for token plan, but for first time users, do they make exception? On a day 1 without coding reaching weekly %28 usage is insane. Do they ever reply to their emails?

reddit.com
u/Azamat0212 — 4 days ago
▲ 17 r/ZaiGLM

GLM 5.1 comparison on wafer pass vs zai

Both plans compared, there are hick ups but overall decent enough to be used as my primary provider, been thinking about cancelling the legacy plan but i'll hold off for just a while longer.

u/founders_keepers — 7 days ago
▲ 10 r/ZaiGLM

Is there a way to estimate how many users have left Z.AI since price change?

Curious to know if there is a way to see usage/users who left based on the inconsistencies in the actual product (Uptime/rate limits etc.) and the overnight bamboozle of more than doubling prices.

Maybe this was an international thing and the Chinese market is propping them, but in my real day-to-day, anecdotally it seems like 80%-90% of people I know in the dev community who used Z.Ai have cancelled their plan.

Anyone know of any potential way, even rough estimate to figure that out? I'm really keen to do a case study on them.

reddit.com
u/Cute_Dragonfruit4738 — 8 days ago