r/LLM_Gateways

Execution budgets don't just reduce tokens, they reduce unrequested features (847 → 423 tokens)

A couple of days back, I shared Token Sensei, a runtime that gives AI agents a fixed execution budget.

Here's another data point.

Task

Build a Python script that reads a CSV file and prints the average of a numeric column.

Unconstrained Claude

- Named function with a full docstring
- Two example usage blocks
- Interactive `input()` mode
- Warning messages for every skipped row

~50 lines, **847 tokens**

None of those were in the prompt

Token Sensei (budget 200)

40 lines, **423 tokens**- CLI using `sys.argv`
- Proper error handling
- No docstrings
- No examples
- No interactive mode

50.1% fewer output tokens (847 → 423), while still satisfying the requested specification.

I saw the same pattern in three different tasks last week: lower token usage, requirements met, and no unrequested features.

My assessment is that execution budgets don't just shorten outputs. They change what the model wants. With a hard budget, the model spends tokens on the requested task instead of adding features it predicts might be helpful.

Has anyone else observed similar behavior with constrained inference?

GitHub: github.com/shouvik12/token-sensei

reddit.com

u/Substantial_Load_690 — 4 days ago

▲ 21 r/LLM_Gateways+2 crossposts

What's everyone actually using for an AI gateway in prod? Tired of duct-taping LiteLLM together

We're a mid-size eng team, actively building on LLMs. Started with LiteLLM as a proxy because it was the obvious free option and it worked fine for a while.

Problem is we're now at a point where:

multiple teams hitting the same openAI/anthropic keys with zero visibility into who's burning what
had an incident where one team's batch job ate through our entire monthly quota in 4 hours
no clean way to do fallbacks - when Anthropic had that streaming outage a few months back we were just down

We've looked at portkey (feels very saas-y, got acquired recently which introduces uncertainty around future priorities, roadmap alignment, which is worth considering for long-term infrastructure decisions), helicone (good observability but routing feels thin), and briefly at building something in-house (our infra lead said absolutely not).

Also came across Truefundry seems more enterprise-focused, claims sub-3ms overhead and they have priority-based fallback chains built in

What is everyone else using? Any other options I'm missing?

Not looking for the "just use this" response here, just trying to understand the tradeoffs. We're on AWS, will need VPC deployment eventually.

reddit.com

u/Background-Job-862 — 12 days ago

▲ 23 r/LLM_Gateways

LiteLLM Rust Migration

LiteLLM is moving to Rust. Sub-1ms overhead. A sub-100MB binary. The same Python SDK and AI gateway you already use.

Over the past year we've heard the same thing from our users and our community - they want the fastest, most lite AI gateway they can run. We've heard you, and we're committing to it.

This goes straight at the problems our customers report: latency spikes under load, and the memory leaks and OOM kills that take pods down at the worst possible time. A Rust hot path is faster and bounded in memory, so those whole classes of issues go away.

It's a gradual, non-breaking change. The Python SDK and proxy stay exactly the same - under the hood they start calling the Rust binary through PyO3, one component at a time, each proven in production before the next.

The whole ai gateway will be running on Rust by December 1, 2026.

We think this is the right call to build the best, most scalable, and cheapest AI gateway out there.

Read the full announcement: https://docs.litellm.ai/blog/litellm-rust-launch

u/WarningOut_OfMinD — 13 days ago

▲ 8 r/LLM_Gateways

What LLM gateway are you using in production in 2026?

We're handling 25K AI requests a day across Openai, Anthropic, and Bedrock. Direct API integrations are becoming difficult to manage due to provider outages, cost visibility issues, API key sprawl, and customer-tier rate limiting. We've been evaluating gateways and so far Bifrost, Truefoundry AI gateway, Cloudflare AI gateway, LiteLLM, and Kong AI gateway seem like the strongest options, each with different tradeoffs around performance, governance, observability, and operational complexity. Curious what others are running in production today and what has worked best at scale.

reddit.com

u/PreviousAd9843 — 13 days ago