u/founders_keepers

GLM 5.1 comparison on wafer pass vs zai
▲ 17 r/ZaiGLM

GLM 5.1 comparison on wafer pass vs zai

Both plans compared, there are hick ups but overall decent enough to be used as my primary provider, been thinking about cancelling the legacy plan but i'll hold off for just a while longer.

u/founders_keepers — 8 days ago

Opencode + Cheap DS V4 Pro on Wafer Pass vs Other providers

Extension of my last post about GLM here.

Not there yet in terms of token output per second, but for flat fee of $10/wk you set it on background jobs.

u/founders_keepers — 10 days ago

GitHub - antirez/ds4: DeepSeek 4 Flash local inference engine for Metal

Dropped by founder of Redis. This is a custom native inference engine built specifically for DeepSeek v4 Flash.

on a M3 max, 128GB, stock ds4 settings:
- 14–15 t/s at 62K pre-filled actual coding conversation
- memory usage was flat during gen ~85GB res
- disk cache is ~8GB for a full 100K context window
- thermals were normal, light fan activity
- inference server is rock solid so far

Haven't played around with it yet but going to give it a go tomorrow when I get time.

github.com
u/founders_keepers — 12 days ago

MiniMax-M2.7 live with a 204,800 token context window, built for long-context coding agents and production engineering workflows. Starting at $10/week.

u/founders_keepers — 15 days ago
▲ 7 r/ZaiGLM

Comparison based on E2E on real usage, so including TTFT. Tokens per second.

For flat fee of $10/week very bullish on small inference providers.

u/founders_keepers — 23 days ago