
GLM 5.1 comparison on wafer pass vs zai
Both plans compared, there are hick ups but overall decent enough to be used as my primary provider, been thinking about cancelling the legacy plan but i'll hold off for just a while longer.

Both plans compared, there are hick ups but overall decent enough to be used as my primary provider, been thinking about cancelling the legacy plan but i'll hold off for just a while longer.
Extension of my last post about GLM here.
Not there yet in terms of token output per second, but for flat fee of $10/wk you set it on background jobs.
Dropped by founder of Redis. This is a custom native inference engine built specifically for DeepSeek v4 Flash.
on a M3 max, 128GB, stock ds4 settings:
- 14–15 t/s at 62K pre-filled actual coding conversation
- memory usage was flat during gen ~85GB res
- disk cache is ~8GB for a full 100K context window
- thermals were normal, light fan activity
- inference server is rock solid so far
Haven't played around with it yet but going to give it a go tomorrow when I get time.
Has anyone tried this provider? Would love your genuine feedback.
MiniMax-M2.7 live with a 204,800 token context window, built for long-context coding agents and production engineering workflows. Starting at $10/week.
Tracking comparisons for the past few weeks, more full comparison here https://www.reddit.com/r/ZaiGLM/comments/1sz0gv3/glm51_on_wafer_pass_vs_zai/
For open source I'm very bullish on small providers, especially if they're local.
Comparison based on E2E on real usage, so including TTFT. Tokens per second.
For flat fee of $10/week very bullish on small inference providers.