r/MiniMax_AI

▲ 326 r/MiniMax_AI+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago

▲ 3 r/MiniMax_AI

Need Advice: MiniMax vs Z.ai vs Kimi

I already have Codex and Antigravity. For heavy coding (backend, architecture, and UI), would you recommend adding the MiniMax Token Plan, Z.ai Coding Plan, or Kimi? Which one has the best quality and value?

r/MiniMax_AI

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Need Advice: MiniMax vs Z.ai vs Kimi

My Token Plan Experience

NEVER subscribe to MINIMAX for your projects! It is a marketing-heavy trap that lacks the functionality needed for serious projects!

The Minimax M3 Scam: Lies, Mocks, and Complete Disrespect for AGENTS.md

Usage Limits and Credits

Deepseek V4 pro vs Minimax M3. Judge is Opus 4.8. Results are disappointing

Update: my open-source MiniMax GUI is now a native desktop app, MiniMax Studio (Win / macOS / Linux)

Model Stacking GLM 5.2 and Minimax 3

MiniMax_M3 - 5H limit hit in 21 minutes of work! [Similar/Mirrored experience of codex and claude?]

Is Minimax dumb again?

Lesson LEARNED

DO NOT buy the new MiniMax M3 $20 Token Plan for agentic coding. It’s a complete marketing scam.

MiniMax Desktop APP is not good at all.

M3/Token Plan: 753M tokens burned in 25 days with Claude Code - exported the CSV, the numbers are wild

MiniMax Plus 1.7B tokens - real or rate limited?

Minimax M3

For all the "How much do you get?" posts

MiniMax M3 is so cheap yet so powerful!

Does the MiniMax M3 1.7B/month Token Plan count cached tokens, or only uncached tokens?

My observation

My question