r/Temporal

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

{ "type": "send-email", "icon": "Mail", "label": "Send Email", "schema": { "type": "object", "properties": { "to": { "type": "string" }, "subject": { "type": "string" }, "retries": { "type": "number" } } } }

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Workflow Builder - open-source, embeddable React SDK for visual workflow editors (runs on Temporal)