u/ZombieGold5145

r/Rag r/micro_saas r/promoteMyApp r/MicroSaaSBR r/IA_Italia r/claudeskills r/brdev r/artificial r/AIToolBench r/dyadbuilders r/AIAgentsStack r/StartupSoloFounder r/ZaiGLM r/Temporal r/AiBuilders r/OnlyAICoding r/bestai2026 r/MiniMax_AI r/chatgptplus r/ClaudeCoder r/OpenSourceAI r/n8nbusinessautomation r/Qwen_AI r/n8nforbeginners r/localaiapps r/ContextEngineering r/AgentSkills r/Agentic_AI_For_Devs r/nocode r/LlamaIndex r/coolgithubprojects r/InteligenciArtificial r/devBR r/kiroIDE r/AIProductivityLab r/better_claw r/WebApps r/AIinBusinessNews r/learnmachinelearning r/IMadeThis r/saasbuild r/LLMDevs r/AiChatGPT r/sideprojects r/WTFisAI r/ProgramadoresBrasil r/devtools r/GenAiApps r/projects r/DigitalEscapeTools r/AIAgentsInAction r/MacOSApps r/Development r/mcp r/AISEOInsider r/SelfHostedAI r/OpenClawUseCases r/AIToolsPromptWorkflow r/MCPservers r/GoogleGeminiAI r/ArtificialNtelligence r/prettyusefulwebsites r/AIDiscussion r/PythonBrasil r/WebAfterAI r/DevOpsLinks r/DesenvolvedoresBrasil r/Claudeopus r/AILearningHub r/LLM r/ArtificialInteligence r/coding_agents r/ollama r/PromptEngineering r/LargeLanguageModels r/Buildathon r/opencode r/BuildWithClaude r/microsaas r/learnAIAgents r/appdev r/n8n_on_server r/automation r/VibeCodingList r/ChatGPTPromptGenius r/AIToolsPerformance r/modelcontextprotocol

▲ 3 r/PythonBrasil

Gateway de IA grátis e self-hosted, usável do Python: 237 provedores (90+ grátis) via um base_url no OpenAI SDK, com fallback + compressão (MIT)

Fala, pessoal. Compartilhando um projeto open-source que uso muito a partir do Python (disclosure: sou o mantenedor; é grátis/MIT). Como ele expõe um endpoint compatível com OpenAI, dá pra usar direto do openai do Python só trocando o base_url:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:20128/v1", api_key="...")

E aí seu código Python herda:

Combos de fallback — pra nunca parar no meio da tarefa. Um "combo" é uma escada de modelos que o roteador percorre sozinho: primeiro sua assinatura, depois chaves de API, depois modelos baratos, depois os grátis. Quando um provedor devolve 500 ou você bate no rate limit, ele desliza para o próximo alvo em milissegundos, no meio da requisição, e sua ferramenta nem vê o erro. São 17 estratégias de roteamento mais três camadas de resiliência — circuit breaker por provedor, cooldown por chave e lockout por modelo — então uma chave morta não derruba o provedor inteiro.

Um endpoint, 237 provedores — 90+ deles grátis. Você aponta qualquer ferramenta ou agente para um único endpoint compatível com OpenAI (localhost:20128/v1) e ele alcança 237 provedores de LLM sem reescrever nada. 90+ têm free tier e 11 são grátis pra sempre (sem cartão), somando ~1,6B de tokens grátis/mês documentados — e é uma conta honesta, deduplicada por pool (contamos cada pool compartilhado uma vez, sem inflar; a metodologia está no repositório). Tem setup-* de um comando para 13+ ferramentas (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…).

Um pipeline de compressão de 10 engines — a parte que a maioria dos roteadores não tem. Toda requisição passa por uma etapa transparente de compressão que você liga/empilha por combo. Em vez de um truque só, ele junta o melhor do ecossistema open-source: o RTK filtra saída de comando/ferramenta (git diff, logs de teste, builds) em 60–90%, o LLMLingua-2 (Microsoft) faz poda semântica por ML, o Caveman cuida de prosa, e a deduplicação remove repetições entre turnos. O crucial: código, URLs e JSON são preservados byte-a-byte, e um guarda de inflação (ligado por padrão) descarta a versão comprimida e envia o original se comprimir fosse aumentar o prompt — nunca piora. Em sessões cheias de ferramenta isso dá ~89% de redução média de tokens de entrada. Todo o crédito às fontes (RTK, Caveman, LLMLingua-2, Troglodita) está no README.

Pra você avaliar se vale o tempo: o projeto passou de ~9,8 mil estrelas no GitHub, 1.490+ forks e 280+ contribuidores em ~4,5 meses, com 21.000+ testes automatizados e 1.830+ issues fechadas — ou seja, é maduro e validado, não um experimento de fim de semana.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute

Alguém aqui já usa um gateway assim nos projetos Python? Curioso pra saber como vocês tratam fallback.

#	Provider	Prefix	Models	Cost	Auth	Multi-Account
1	Kiro	`kr/`	claude-sonnet-4.5, claude-haiku-4.5, claude-opus-4.6	$0 UNLIMITED	AWS Builder ID OAuth	✅ up to 10
2	Qoder AI	`if/`	kimi-k2-thinking, qwen3-coder-plus, deepseek-r1, minimax-m2.1, kimi-k2	$0 UNLIMITED	Google OAuth / PAT	✅ up to 10
3	LongCat	`lc/`	LongCat-Flash-Lite	$0 (50M tokens/day 🔥)	API Key	—
4	Pollinations	`pol/`	GPT-5, Claude, DeepSeek, Llama 4, Gemini, Mistral	$0 (no key needed!)	None	—
5	Qwen	`qw/`	qwen3-coder-plus, qwen3-coder-flash, qwen3-coder-next, vision-model	$0 UNLIMITED	Device Code	✅ up to 10
6	Gemini CLI	`gc/`	gemini-3-flash, gemini-2.5-pro	$0 (180K/month)	Google OAuth	✅ up to 10
7	Cloudflare AI	`cf/`	Llama 70B, Gemma 3, Whisper, 50+ models	$0 (10K Neurons/day)	API Token	—
8	Scaleway	`scw/`	Qwen3 235B(!), Llama 70B, Mistral, DeepSeek	$0 (1M tokens)	API Key	—
9	Groq	`groq/`	Llama, Gemma, Whisper	$0 (14.4K req/day)	API Key	—
10	NVIDIA NIM	`nvidia/`	70+ open models	$0 (40 RPM forever)	API Key	—
11	Cerebras	`cerebras/`	Llama, Qwen, DeepSeek	$0 (1M tokens/day)	API Key	—

Strategy	What It Does	Best For
Priority	Uses nodes in order, falls to next only on failure	Maximizing primary provider usage
Round Robin	Cycles through nodes with configurable sticky limit (default 3)	Even distribution
Fill First	Exhausts one account before moving to next	Making sure you drain free tiers
Least Used	Routes to the account with oldest lastUsedAt	Balanced distribution over time
Cost Optimized	Routes to cheapest available provider	Minimizing spend
P2C	Picks 2 random nodes, routes to the healthier one	Smart load balance with health awareness
Random	Fisher-Yates shuffle, random selection each request	Unpredictability / anti-fingerprinting
Weighted	Assigns percentage weight to each node	Fine-grained traffic shaping (70% Claude / 30% Gemini)
Auto	6-factor scoring (quota, health, cost, latency, task-fit, stability)	Hands-off intelligent routing
LKGP	Last Known Good Provider — sticks to whatever worked last	Session stickiness / consistency
Context Optimized	Routes to maximize context window size	Long-context workflows
Context Relay	Priority routing + session handoff summaries when accounts rotate	Preserving context across provider switches
Strict Random	True random without sticky affinity	Stateless load distribution

u/ZombieGold5145

Gateway de IA grátis e self-hosted, usável do Python: 237 provedores (90+ grátis) via um base_url no OpenAI SDK, com fallback + compressão (MIT)

Infra for web agents: routing them across 237 providers with millisecond fallback + a cheaper-model ladder (free, self-hosted)

Trimming RAG context before the model: a 10-engine compression pass (60–90% on retrieved/tool output) with byte-perfect code/JSON preservation

OmniRoute (omniroute.online) — a free, self-hosted tool to use 237 AI providers from one place, 90+ free, never rate-limited

Use case: keeping VPS OpenClaw agents cheap and always-on by fronting them with a self-hosted gateway (fallback + compression)

Give your self-hosted n8n AI nodes automatic fallback + free providers — point them at a self-hosted gateway (free, MIT)

Cutting Opus cost and never hitting its limit: a free, self-hosted gateway with token compression + automatic fallback

The wall when building with Claude Code is the usage limit — here's a free, self-hosted way to keep it running past that

Performance notes from an open-source LLM gateway: 60–90% token reduction on tool output + millisecond provider failover — how do you benchmark this?

An AI setup that doesn't stall mid-workflow: route across 237 providers with auto-fallback (90+ free) — sharing how it works

Instead of betting on one AI provider, I route across 237 of them — is multi-provider the pragmatic future, or over-engineering?

A free, no-install-headache way to use many AI models in your no-code stack (90+ free) — auto-switches when one hits a limit

Keep hitting Gemini rate limits / 'unusual activity' walls? I built a free MIT gateway that auto-fails-over Gemini across 237 providers (self-hosted)

A self-hosted gateway so AI automations never stall on a rate limit — 237 providers (90+ free), millisecond fallback (open source)

A free tool for prompt engineers: run the same prompt across 237 models from one endpoint (90+ free), plus Output Styles to steer results

I spent ~4.5 months building a free, self-hosted AI gateway: one endpoint for 237 providers (90+ free), auto-fallback, and a token-compression pipeline (MIT)

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

The problem: every developer using AI tools hits the same walls

The $0/month stack — 11 providers, zero cost, never stops

The Combo System — OmniRoute's core innovation

How combos work

13 Routing Strategies

Auto-Combo: The AI that routes your AI

Context Relay: Session continuity across account rotations

The 4-Tier Smart Fallback

Every tool connects through one endpoint

MCP Server — 25 tools, 3 transports, 10 scopes

Installation — 30 seconds

Real-world playbooks

Playbook A: $0/month — Code forever for free

Playbook B: Maximize paid subscription

Playbook D: 7-layer always-on