r/ClaudeCoder

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)
▲ 326 r/ClaudeCoder+69 crossposts

I built an open-source, self-hosted AI gateway: 237 providers (90+ free), auto-fallback combos, and a 10-engine token-compression pipeline (MIT)

Builders-welcome post with the substance up front (disclosure: I'm the maintainer). OmniRoute is a free, MIT, self-hosted AI gateway — one OpenAI-compatible endpoint over 237 providers — built around two problems: runs dying on a provider 429, and tokens bleeding on tool/log output.

One endpoint, 237 providers — 90+ of them free. You point any tool or agent at a single OpenAI-compatible endpoint (localhost:20128/v1) and it can reach 237 LLM providers without you rewriting anything. 90+ have free tiers and 11 are free forever (no card), which aggregates to ~1.6B documented free tokens/month — and that's honest, pool-deduped math (we count each shared pool once instead of inflating it; the methodology is public in the repo). There's a one-command setup-* for 13+ coding tools (Claude Code, Codex, Cursor, Cline, Roo, Kilo, Gemini CLI…), so switching your existing setup over takes seconds.

Fallback combos — so it never stops mid-task. A "combo" is a ladder of models the router walks automatically: your subscription first, then API keys, then cheap models, then free ones. When a provider returns a 500 or you hit a rate limit, it slides to the next target in milliseconds, mid-request, and your tool never even sees the error. There are 17 routing strategies (priority, weighted, round-robin, cost-optimized, auto/coding:fast…) plus three resilience layers — a per-provider circuit breaker, a per-key cooldown, and a per-model lockout — so one dead key can't take down a whole provider.

Fusion — an ensemble mode for the hard steps. Beyond simple routing, there's a fusion strategy that fans a single prompt out to a panel of different models in parallel and then has a judge model synthesize one best answer (mixture-of-agents, built in). It's cost-aware, so easy turns stay on one fast model and it only fuses when the step is worth it.

A 10-engine compression pipeline — the part most routers don't have. Every request flows through a transparent compression pass you can toggle/stack per combo. Instead of one trick, it stacks the best of the open-source ecosystem: RTK filters command/tool output (git diffs, test logs, builds) at 60–90%, Microsoft's LLMLingua-2 does ML semantic pruning, Caveman handles prose, session-dedup strips repeats across turns. Critically, code, URLs and JSON are preserved byte-perfect, and a default-on inflation guard throws the compressed version away and sends the original if compressing would actually grow the prompt — it never makes things worse. On tool-heavy sessions that's ~89% average input-token reduction (an 8k-token git diff becomes a few hundred). Full credit to every upstream project (RTK, Caveman, LLMLingua-2, Troglodita) is in the README.

Agent-native — the agent can drive the router itself. There's a built-in MCP server (95 tools across 30 audited scopes, over stdio / SSE / streamable-HTTP), plus A2A (v0.3, JSON-RPC 2.0) support. That means an agent can query providers, switch combos, read its own remaining quota and manage memory through the gateway — not just consume tokens through it.

It's 100% local (zero telemetry, AES-256-GCM at rest), MIT-licensed, has a prompt-injection guard on every LLM route, opt-in memory, and runs on npm, Docker, desktop or your phone via Termux.

For context on whether it's worth your time: it's grown to ~9.8K GitHub stars, 1,490+ forks and 280+ contributors in ~4.5 months, with 21,000+ automated tests and 1,830+ issues closed — so it's a battle-tested project, not a brand-new experiment.

npm install -g omniroute

GitHub: https://github.com/diegosouzapw/OmniRoute · Site: https://omniroute.online

Would value a critique of the routing/compression architecture from this crowd.

u/ZombieGold5145 — 2 days ago
▲ 10 r/ClaudeCoder+1 crossposts

PO and Claude

I recently became a PO (internal system development) and just started using Claude; I could use some tips on how to get the most out of it.

Specifically, building a knowledge base, planning sprints, etc.—does anyone have experience with this?

Best regards

reddit.com
u/WallmagicAI — 2 days ago
▲ 28 r/ClaudeCoder+13 crossposts

Deterministic folding for LLM agents: continuity without LLM compaction

I just open-sourced Context Warp Drive, a continuity engine for LLM agents.

Repo: https://github.com/dogtorjonah/context-warp-drive

Right now, the industry has two bad ways of dealing with long agent horizons:

  1. Just ride the 1M-2M context window.
  2. Use an LLM to summarize older messages ("compaction").

LLM summaries are inconsistent, they burn an extra model round-trip, they quietly drop the exact identifiers your agent needs (UUIDs, paths, hashes), and worst of all, they constantly rewrite the prefix—which trashes your provider prompt cache.

This library takes a different approach: deterministic folding.

As the agent works, older context is folded into deterministic skeletons. Instead of linearly bloating to the ceiling, the active context sawtooths—building up efficiently, then dropping back down to a clean floor without losing continuity.

Why not just use the 1M token window?

Because 95% of what an agent carries with it on a long task isn't needed right now. It's looking for the needle in the haystack, but massive context windows force it to carry all the hay.

A larger window raises the ceiling, but it doesn't move the floor where models reason best. Long-context evals keep showing the same thing—models do not use giant contexts as cleanly as the marketing numbers imply:

By keeping the agent deterministically folding with a warm cache and a low context band, you keep it snappy, cheap, and focused. You leave the hay behind until it's actually needed.

How Context Warp Drive works:

  • The Rebirth Seed: The continuity package that makes the full reset possible. It carries the recent user and AI messages, what the agent was actively working on and editing, its execution plan state, preserved exact identifiers from the full trace, and episodic context from earlier work. It is not a vague summary—it is a structured, deterministic snapshot the agent can wake up from and continue seamlessly.
  • Cache-Hot Appending: As the agent works, older turns fold into compact bands that append onto the rebirth seed. The context builds up over time, but because the seed stays byte-identical, you pay for cheap cache reads turn after turn instead of expensive fresh inputs.
  • The Sawtooth Reset: You can't append forever. When measured input pressure hits your configured ceiling, the engine performs the full sawtooth—the context drops back to a fresh rebirth seed and the cycle continues from a low-context floor.
  • Zero-LLM Folding: Raw chat history stays preserved as the source of truth, but the model sees a deterministic compact view. Tool calls, paths, receipts, retained reasoning, and exact identifiers are all preserved without asking another model to summarize anything.
  • Episodic Recall: When the agent re-touches a path or concept from before the reset, the engine pages the relevant folded detail back in. The agent doesn't carry all the hay—it pulls it back when it matters.
  • Task Rail: I also included a portable execution primitive called TaskRail. It keeps long-horizon plan state outside the prompt: steps, progress, acceptance criteria, and serializable checkpoints. Combined with folding and rebirth seeds, the agent stays low-context while still knowing exactly where it is in a multi-step workflow.

What's in the repo:

  • Core folding engine, provider-agnostic across Anthropic content blocks, OpenAI-style tool_calls, and Gemini parts.
  • Anthropic prompt-cache breakpoint helpers to maximize read-hits.
  • Raw rebirth seed renderer.
  • Model-aware context budget resolver.
  • Fold recall and episodic recall (with an optional SQLite episode store).
  • Portable Task Rail state machine.
  • Gemini CLI and Codex CLI folding adapters.

There are a lot of knobs you can tune, but the core philosophy is the same: use the 1M window as safety headroom, not as the operating band.

(Not on npm yet—install from source for now.)

I've been running this in my own multi-agent orchestration stack for months and completely dropped LLM compaction. The difference is fundamental: the agent stops treating context as a giant backpack and starts treating it like a paged working set—small, hot, recoverable, and always grounded in the raw trace.

u/MusicToThyEars — 2 days ago
▲ 1 r/ClaudeCoder+1 crossposts

Need help turning my custom website into a Shopify theme

Hi everyone,

I'm building my clothing brand, LOCARD, and I've designed a completely custom website using HTML, CSS, JavaScript, and Claude Code instead of a standard Shopify theme.

I've uploaded it to Shopify, but I'm struggling to make it work properly with Shopify's backend.

I need help connecting it to:

  • Shopify products
  • Collections
  • Product pages
  • Add to Cart
  • Cart
  • Checkout
  • Variants
  • Search

I don't want to use a standard Shopify theme like Dawn. I want to keep my current design exactly as it is and simply make it work with Shopify.

I've attached a few screenshots of the website.

Has anyone done something similar before or have any advice on the best way to approach this?

Any help would be greatly appreciated. Thanks!

reddit.com
u/No-Leadership-5801 — 7 days ago
▲ 1 r/ClaudeCoder+1 crossposts

Claude instantly hits the 5-hour message limit when resuming a previously completed coding task — why

https://preview.redd.it/6b5wthcm7o9h1.png?width=1538&format=png&auto=webp&s=eff09ca678377626f8169914c33d86b4933c8137

I'm using Claude to build a large HTML/CSS/JS project (a multi-screen interactive website). Because the file became very large, I had to work across multiple sessions due to the 5-hour usage limit.

Here's what happened:

  • Claude had already completed the heavy work in a previous session. It had analyzed files, applied modifications, built the updated HTML, and even showed logs like:
    • "Execute clean v8 build from v7 base"
    • "Done! 24.80 MB"
    • "Copy already-built file to outputs"
  • At that point, the only thing remaining seemed to be returning/exporting the final HTML file to me.

I then waited for the full 5-hour cooldown period. After the limit reset, I sent a very short prompt essentially saying:

>

However, within less than a minute, Claude immediately showed the "You've reached your 5-hour usage limit" message again, without actually giving me the file.

What's confusing is:

  • No new analysis was requested.
  • No new files were uploaded.
  • No additional modifications were needed.
  • The expensive processing had already been completed before the previous limit exhaustion.

This has happened to me more than once, on different Claude accounts and in different conversations.

My questions are: NO Questions 😭 JUST TELL ME HOW TO GET MY FILE OUT OF IT IN NEXT PROMPT.

I'd appreciate any explanation from people who have worked with long coding sessions in Claude. Thanks!

reddit.com
u/Low-Measurement0001 — 9 days ago
▲ 1 r/ClaudeCoder+1 crossposts

Cómo manejan equipos de desarrollo el código cuando usan Claude Code?

Tengo un negocio de software a medida, trabajo solo hace años con Claude Code en mis proyectos locales. Ahora estoy en el punto de contratar a un desarrollador para escalar, pero tengo un cuello de botella: todo mi código y conocimiento está en mi máquina local.

El problema concreto: cuando viajo o no estoy en mi oficina, no puedo seguir trabajando porque Claude Code necesita acceso a los archivos locales. Mi desarrollador tendría el mismo problema estaría atado a una máquina específica.

La pregunta es, cómo lo hace la gente hoy? Ponen los proyectos en un servidor centralizado para que múltiples personas trabajen con Claude Code desde cualquier lado? O siguen el flujo clásico de GitHub con cada dev en su máquina local?

Me preocupa que si migro a un servidor remoto pierda eficiencia con Claude Code, o que no sea el flujo que usa la comunidad actualmente.

Alguien está trabajando así? Cómo lo están resolviendo?

Gracias!

reddit.com
u/Bectec_Software — 8 days ago
▲ 3 r/ClaudeCoder+1 crossposts

How do you review big Claude Code changes?

For people using Claude Code on real repos:

When Claude makes a bigger change, how do you usually review what actually happened before trusting it?

Do you mostly rely on git diff and tests, or do you ever wish you had a session-level view of what happened — tools called, files touched, edits/writes, and where it kept looping/churning?

I’m asking because I’ve been testing this on my own sessions and I’m trying to figure out whether this is a real workflow pain or just my own paranoia.

Curious what your review process looks like after a big Claude Code run.

reddit.com
u/ardauo2012 — 13 days ago
▲ 3 r/ClaudeCoder+3 crossposts

OpenAI Unveils Its First Custom AI Chip, Built for ChatGPT and Future AI Agents

[effacé]

u/Key-Twist-1846 — 10 days ago