u/Accurate-Pudding-999

I’m working on a multi-model API proxy for teams that use LLMs heavily, and I’d love to get feedback before we onboard our first batch of users later this month.

The idea is to provide one API layer for accessing major current models, with routing, fallback, usage controls, observability, and developer-friendly tooling on top.

I’m especially interested in hearing from:

small and mid-sized companies using AI APIs in production
teams building internal AI tools
people heavily using agentic coding tools
dev teams switching between multiple model providers
anyone dealing with cost, latency, reliability, rate limit, or quota issues

A few things I’m trying to understand:

Which models would you need supported on day one?
Do you care more about cost, latency, quality, context window, or automatic fallback?
What is the most painful part of your current AI API setup?
Would task-based routing be useful? For example, cheaper models for simple tasks, stronger models for coding/reasoning, fallback models when a provider is down.
What kind of observability would you want? Logs, traces, cost per user, cost per project, prompt/version tracking, evals?
For agentic coding workflows, what matters most: tool calling reliability, context window, latency, model choice, rate limits, or something else?
If you’re an SMB, what would make you trust a proxy layer enough to put it between your app and the model providers?

Not trying to make this a sales pitch — we’re still shaping the product and want to build around real workflows instead of guessing.

If you use AI APIs heavily, especially for coding agents or production workflows, I’d really appreciate hearing what your ideal setup would look like.

Building a multi-model API proxy for AI-heavy teams — what would you actually want from it?