r/ZaiGLM

▲ 5 r/ZaiGLM+1 crossposts

What is the cheapest API provider for GLM 5.2?

u/m0_80 — 4 hours ago

▲ 4 r/ZaiGLM+1 crossposts

Any one ran GLM 5.2 on two M5's 128gb?

Hi, I need to see how many tokens / s did you get on 2 M5 Max 128gb?

I did test on one M5 128gb with ssd streaming and got little over 3.7 t/s so wondering if we merge two M5 max how much difference would it make in t/s

reddit.com

u/nabeelkh5 — 3 hours ago

▲ 72 r/ZaiGLM

Bought Z.ai MAX today. Unsubscribed already.

I am a Pro user since April, before the price hike. GLM 5.1 was great and turbo also.

Recently Pro is finishing the 5-hour quota in less than an hour. I was under the impression that it was because of the limits or maybe because I was an older user, before the price hike.

So I went and bought a brand-new Max account at $144 per month. Supposedly it has 4x limits compared to Pro. Pfffft...

The work both subscriptions did was sub-par, took too long for things which are basic and ate both my 5-hour limit subscriptions without even finishing the basics of a web app.

Totally and utterly ridiculous. I know the fans will start calling me a noob and blah blah blah. I have 3 business codex accounts I'm using for my project, one kimi, deepseek 4 pro, gemini and a couple small others.

I really liked GLM 5 and 5.1

The result, price and overall quality of GLM 5.2 does not worth $25 / month, let alone $144. That was money I'm never seeing again.

Extremely disappointed. GLM is supposed to be comparable to the coding and results of Claude, not do the horrendous business practice of Anthropic 1:1

Totally and utterly ridiculous

reddit.com

u/elelem-123 — 11 hours ago

▲ 3 r/ZaiGLM

Is GLM5.2 using Claude models behind? OR is it just that it thinks its using Claude models

so i was trying to test glm 5.2 on z,ai website on agent tab for comparing it to other ai models it just did a chain of thought about deploying haiku/sonnet/opus subagents to do some of the work

https://preview.redd.it/kko6194gsebh1.png?width=716&format=png&auto=webp&s=07cf6e67fb98589edf6f92625fe11cdfd5e15fcf

reddit.com

u/Present-Tree-7698 — 12 hours ago

▲ 1 r/ZaiGLM

How’s low level (kernel/C++) development with GLM?

I’ve been exceptionally well-behaved with Claude, but I’m forced to use Opus 4.7 because it’s the only “unlocked” version available. This is due to the insane classifiers that analyze everything I do, even low-level, kernel-related tasks that are not remotely suspicious or questionable.

I tested GLM 5.2 once or twice, but I didn’t “dare” to use it in serious and important development projects. What I noticed on more challenging tasks was the insane thinking loop: “But what if, okay, but then, no, but what if, okay, but, anyways, okay, maybe, no, maybe.” It couldn’t complete the task after 30 minutes, while Claude finished it within 7 minutes, as an example.

reddit.com

u/Comprehensive-Bet-83 — 9 hours ago

▲ 0 r/ZaiGLM

z.ai is dumb

https://preview.redd.it/16kmakvgidbh1.png?width=1858&format=png&auto=webp&s=5d9b1ee7ab7a09f222b88999a36c0a03f898ee10

the ai thinks we are in 2024 in other chats i asked it if it was smarter than fable 5 and it says that fable 5 by claude doesnt exist

dont buy a subscription

reddit.com

u/East-Oven4645 — 16 hours ago

▲ 0 r/ZaiGLM

Subscription sucked up my 5h while idle

Hi, I am new to z.ai I subscribed just yesterday with the lite monthly. I wanted to test the 5.2 before go for it.. Till now I used claude and still have the subscription with them.

I used z 5.2 yeasterday and was very happy with the outcome. the problem is that my 5h goes by very fast. But I am working on a complex repo so it's ok.

This morning tho I asked a couple of questions on planning and tell the model to wait while I revised the suggested plan. The problem is that after less than an hour the usage was at 100% and there were less the half token used there.. in total 5.5m. while in the other session I was closer to 15m.. Is it there anything I am missing or is it an error?

I am using opencode.

Thanks in advance

TDLR : new to z and on lite subscription. My last session overburned my time hit 100% eventho I used 1/3 of the tokens . Here to understand what am doing wrong

https://preview.redd.it/cbdk426budbh1.png?width=251&format=png&auto=webp&s=4d7e222ed4ce4f3dcb5be027283c001448e57079

reddit.com

u/geekyNut — 15 hours ago

▲ 2 r/ZaiGLM

GLM-5.2 is extremely slow through OpenRouter + Codex — is my setup wrong?

I’m using GLM-5.2 through OpenRouter inside Codex, and it feels extremely slow.

Is there anything wrong with my setup, or is this expected behavior? I’m wondering if the slowdown is coming from OpenRouter, Codex, GLM-5.2 itself, or my reasoning settings.

I first tried using Max reasoning, but it got stuck for around 30 minutes without actually writing any code, so I had to stop it manually. Then I switched to High, but it was still very slow.

The task was relatively simple. Has anyone else experienced this with GLM-5.2 on OpenRouter/Codex? Are there recommended settings to make it usable?

reddit.com

u/severe_009 — 12 hours ago

▲ 0 r/ZaiGLM

Anyone using GLM5.2 as a chat option

I want to try out the new Z.ai GLM5.2 model as chat option in VSC. It looks like this extension might work GLM Chat Provider - Visual Studio Marketplace, but not sure if its going to steal my API key and all my money along with it.

reddit.com

u/MOR300 — 15 hours ago

▲ 18 r/ZaiGLM

Help ? How to get GLM cheaply?

Hi, I was trying to get my projects done. But opencode is cutting off insane amount of credits for such little amount of tokens. Like they are not charging according to their mentioned price. Any alternatives?

https://preview.redd.it/9iz2idw9e9bh1.png?width=2286&format=png&auto=webp&s=6f5e3f3ed4848b677b511c5837a1b54a1947ccf2

reddit.com

u/Mega_mewtwo_ — 1 day ago

▲ 13 r/ZaiGLM+3 crossposts

I built an MCP gateway that lets models use Microsoft Copilot for vision and documents

I built a small MCP project and would love feedback from people using OpenCode, DeepSeek, GLM/Z.ai models, or other coding agents.

https://github.com/yurilopes/Copilot-Tools-Gateway

The basic idea is: keep your main coding model as the main agent, but let it call Microsoft Copilot as an auxiliary tool when it needs capabilities the model/tooling may not have, like vision, screenshot understanding, image generation, or document/file-assisted questions.

This is especially useful with models like GLM-5.2 or DeepSeek, where the coding/reasoning may be strong, but the surrounding tool stack may not always expose vision or document understanding.

The gateway exposes Copilot through MCP tools, so an agent like OpenCode can call things like chat, image analysis, image generation, and file-assisted questions using your own local Microsoft account session.

It is unofficial and not affiliated with Microsoft.

I would really appreciate people testing it and telling me what feels good, what feels awkward, what breaks, and what would make it more useful for real agentic coding workflows.

u/QuietPsychonaut — 20 hours ago

▲ 1 r/ZaiGLM+1 crossposts

Is GLM 5.2 a bad joke?

Ich wollte GLM 5.2 von Z.ai in meinem aktuellen VSCode-Projekt ausprobieren. Ich habe es gebeten, eine Watchlist-Funktion hinzuzufügen – einfach einen Button zum Hinzufügen eines Namens und Erstellen einer Watchlist für Aktien. Zuerst gab es Probleme, da dem Button entweder kein Event-Handler zugeordnet war oder die falsche Funktion verwendet wurde: „Fehler beim Erstellen: apiExt.createWatchlist ist keine Funktion“. Dann wurde der Fehler dreimal behoben, und beim vierten Mal funktionierte nix mehr– wegen einem Tippfehler! In meinem Code, daher lässt sich das Frontend nicht mehr kompilieren:

[plugin:vite:oxc] Transformation fehlgeschlagen mit 1 Fehler:

[PARSE_ERROR] Fehler: Erwartet wurde , oder ) , gefunden wurde aber }

╭─[ src/components/WatchlistDetailPanel.tsx:285:87 ]

│ 285 │ onMouseLeave={(e) => (e.currentTarget.style.background = "#3b82f6"}}

│ ┬ ┬

│ ╰────────────────────────────────────────────────── Hier geöffnet

│ │

│ ╰── , oder ) erwartet

https://preview.redd.it/ql9mbkst7abh1.png?width=492&format=png&auto=webp&s=9370885477ec6aa13e61e2fd79f8fb877ecb3fee

https://preview.redd.it/pv71d1we8abh1.png?width=743&format=png&auto=webp&s=58571fc97cde6edc62e668c032cbfb1bb2ebfc7e

Für „dies“ wurden 27 % meines 5-Stunden-Kontingents verbraucht… Ist das ein Witz? Ich hätte das selbst schneller und fehlerfrei hinbekommen, denke ich... Und dann das: Es hat nur den Hintergrundstil geändert? Ich fühle mich irgendwie betrogen. Gibt es eine Möglichkeit, das Geld zurückzubekommen?

EDIT: Evtl. lag es daran, dass ich Z.AI über Claude Code (mit deren "Model Mapping") konfiguriert hatte.
Ich benutze jetzt opencode mit dem Z.AI apikey und im Moment zumindest funktioniert es jetzt deutlich besser... schon komisch manchmal

reddit.com

u/Snoo_87607 — 1 day ago

▲ 3 r/ZaiGLM

Is Lite GLM Coding Plan enough?

Hey guys, I am considering trying GLM Lite plan, and may be someone who is using it can tell, how do the limits feel? Is it enough for a weekend coding, or is it like claude and codex - 5hr is gone in 30 mins?

Thank you

reddit.com

u/Ant312 — 1 day ago

▲ 63 r/ZaiGLM

GLM 5.2 with ClinePass: 61M tokens in one 5-hour coding session

so recently i bought clinepass to give it a try. I used GLM 5.2 as the main workhorse and got solid usage for the price (got it at $1.99).

I use GPT 5.5 in codex as the planner and reviewer. it writes the proposal and execution plan, then GLM 5.2 acts as the implementer.

the result was quite solid. I got two infra refactors done in a single 5-hour session. it consumed around 61M tokens for that one session. the session itself used around 20% of the monthly limit. I started at 3%, based on the image. so if we estimate from that, the monthly limit would be gone in about 5-6 sessions.

well, I can’t complain since it only cost $1.99. it’s good considering the price.

as for GLM 5.2 itself, it literally blew my mind, lol. solid executor. I had codex review the code generated by GLM, and it couldn’t find any major issues. everything aligned with the plan codex gave.

so I think this is a pretty comfortable workflow for people looking for an affordable setup: a $20 codex subscription for planning, review, and the hardest tasks, then a cheaper open-model subscription like cline or opencode for execution.

btw, I tested it on my side project: an AI token usage tracker. it’s local-first and already supports most coding agents like codex, claude code, opencode, antigravity, cline, and zcode.

if you want to give it a try:
https://github.com/fikrilal/burnly

u/usskatyusha — 1 day ago

▲ 1 r/ZaiGLM+1 crossposts

First time I have seen this: my model seemed aware of its context usage ask me for compaction!

I was in a middle of a Claude Code session with GLM 5.2. Context usage 537k/1M. After finishing a task, GLM asked me this:

>Context note: this session has run long and context is getting heavy. (...) I'd suggest either (a) continuing here while context allows (...), or (b) checkpointing now and continuing the remaining chapters in a fresh session (...) Your call — which would you prefer?

First time I have seen this! Usually it is me asking the model to get ready to continue its work after compaction. Nice to see the context is aware of its limits. I will have to investigate if I can find a way for it to trigger compaction on its own when it needs it.

Has anyone experienced something similar? (don't confuse it with auto-compaction)

reddit.com

u/ex-arman68 — 1 day ago

▲ 306 r/ZaiGLM

Fable 5 vs GPT 5.5 vs GLM 5.2 vs DeepSeek v4 Pro

Video source

u/vigneshsmarther — 2 days ago

▲ 89 r/ZaiGLM+2 crossposts

I built a tool for Hermes to help you build better UI

Hey guys and gals.

I built a tool (https://www.typeui.sh/docs/guides/hermes) that helps you let your Hermes agents build better UI by using design skills that lets you build UI in a certain style.

It automatically installs a collection of markdown files that will:

set the style of the UI (choose from here https://www.typeui.sh/design-skills)
installs a UI/UX fundamentals skill file

And then websites generated by your Hermes agent will look like one of the skills that you select from the website.

It's also on Github:

https://github.com/bergside/typeui

u/elwingo1 — 1 day ago

▲ 328 r/ZaiGLM+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago

▲ 54 r/ZaiGLM+13 crossposts

I wanted to learn how coding agents work, so I built one and want to share what I learned

Hey everyone!
I'd like to share a project I've been working on, it's called Orin and it's a coding agent.

I use coding agents constantly, and at some point I realized I had basically no idea what was happening between me hitting enter and code showing up.

Also I was tired of building apps I wasn't able to really debug because I didn't know how they were being built in the first place so I got busy studying: read a bunch of articles, still felt like a black box, so I just tried to build one.

Couple things worth saying before anyone digs in:

It's mostly AI-written code, no point in hiding that, but I don't think "written by AI" and "sloppy" have to go together.

I try to run all my projects in the most professional way I know of, following actual SDLC practices: spec first, then an issue, then the implementation, then a real PR review before anything merges, not vibe-coding where you just accept every diff.

Whether that shows in the actual code is for other people to judge, not me.

Also this isn't some original idea I came up with: I cloned and read through pi.dev, nanocoder, and opencode as primary references (and skimmed Cline/Kilo Code for patterns), and basically tried to take what made sense to me from each and put it into one implementation.

My whole idea was try and build something that took the best from each to make a coding agent that would perform well. I plan to benchmark it on SWE-bench Verified sooner or later, but I don't think it's ready just yet: there are rough edges and bugs, but its usable.

Some of the actual implementation stuff, for anyone who cares about those rather than the pitch:

The loop is just: stream a response from the provider, push it to message history, if there are tool calls run them, push the results back, repeat until there's nothing left to call.
The loop is completely headless — it doesn't touch the terminal, it just emits events. The TUI (SolidJS on top of OpenTUI, just like opencode) is a separate subscriber to those events. You could swap in a totally different frontend without touching the loop at all.
Another thing I got from OpenCode are edits: they go through a fuzzy replacer chain, not a single exact string match — if the model's oldText is off by whitespace or indentation, it falls through a chain of matchers before giving up. I had never thought about this and can confirm it's the kind of thing you don't appreciate until you actually try to implement it.
There's a model routing mechanism that switches different models based on what the agent has to do:
- explore runs on a cheap/fast model by default,
- implement on a code-tuned model,
- review on the main model.
Another thing I borrowed from the web is a delegate_read tool that lets the main agent hand off read-heavy grunt work (scanning a big file, summarizing logs) to a cheap model so that content never bloats the main context.
- It's basically a one off LLM call that only returns a distilled summary, seems dumb but works surprisingly well with capable models like Claude who know exactly what to look for and delegate super well to other agents.
Tool selection isn't a static allow-list. Every turn runs a BM25 retrieval pass over the full tool catalog (including MCP tools) via a super cool library called Ratel, so the model only ever sees the tools relevant to what it's doing in that specific turn instead of the whole catalog every time. There's even an A/B flag to compare tool_pool=ratel vs tool_pool=default in your own telemetry to see if it even makes a difference (similar to how rtk gain works).
Every file write gets snapshotted into a shadow git history before it happens, including stuff done through raw bash — allowing the agent to have a proper /undo /redo command.
When I implemented subagents I wanted to explore different isolation mechanisms and ended up with 3 different ones you can configure yourself:
- shared (edits land on the main working tree, safe because they run serially),
- worktree (isolated branch)
- sandbox (a real E2B cloud VM, edits get thrown away on dispose — for code you don't trust at all).
- The lead model can escalate isolation for a given task but never go below the configured floor.
I implemented hooks borrowing from nanocoder and opencode. This allows the agent to be expanded by third party code and I bundled some sensible defaults:
- there's a before_tool hook that rewrites bash commands through rtk so that command output gets compressed before it ever reaches the model.
In my daily work I build AI agents and vibe coded internal tools for my company and after a while I saw how much telemetry is crucial for debugging and actually understanding agent behaviour, so I decided that my agent would ship native OTLP tracing by default.
- This means that by adding just one environment variable you can see full traces in your telemetry platform (Langfuse, Tempo, Jaeger, whatever you like) out of the box.
Orin is also provider-agnostic (currently supports OpenRouter, OpenAI, Anthropic, OpenCode Go/Zen and Regolo if you want an EU-hosted option) — switching provider or model happens at runtime through a provider registry, no restart needed.

None of this is groundbreaking, it's just what I landed on after reading other people's code and deciding what to keep.

Try it:

git clone https://github.com/thetombrider/coding_agent.git

cd coding_agent

./install.sh

orin

There's also a deepwiki writeup if you want the architecture without reading source: https://deepwiki.com/thetombrider/coding_agent

I would really appreciate feedback in any shape or form. I'm learning and sharing my journey, hope it helps someone.

u/Immediate_House_6901 — 1 day ago

▲ 10 r/ZaiGLM

Getting blueballed by BigModel.CN

Everyday for the last three weeks, I wake up 5 minutes before the supposed 'restock', which is 5 AM my time. Every day, 3 minutes before restock time, it says "too many people are buying", and continues that for anywhere between 8-31 minutes before hitting me with the temporarily sold out.

I've verified my passport and chinese phone number - wondering if even after that anyone has found any useful services as an alternative. I'm really hesitant to use smaller companies, and prefer well known brands. Someone mentioned xfyun, I've heard of 硅基流动, but if anyone has any suggestions I'm all ears.

u/Crafty_Gas_8902 — 1 day ago