
u/Typical-Fee2262

Evaluating LLM gateways for a dynamic NPC engine
We're building the backend for an rpg where npc dialogue generates on the fly. to save cos, we mix models depending on the character. like a fast cheap model for random townsfolk chatter, and a heavier one for main characters dropping actual lore. juggling direct api keys and tracking latency across different providers quickly turned into a mess though. I realized we need more than a list of endpoints, we need an actual control plane. basically something that handles the routing, keeps the connection stable, and gives us real observability into what's failing under the hood. we've been benchmarking a few llm gateways lately to see what actually survives in prod.
• OpenRouter Openrouter was the easiest one for testing models quickly. For NPC dialogue, that matters because random townsfolk and main-story characters do not need the same model. Being able to swap models without changing much backend code made early testing much faster. The downside is cost. The extra fee does not feel huge during testing, but once every player interaction can trigger a model call, it starts to add up. I liked it a lot for model discovery, but I am less sure about using it as the long-term path for high-volume NPC traffic.
• LiteLLM Litellm felt like the best option if you want more control and do not mind owning more of the infra yourself. I liked that it gives a unified interface across providers while still letting us customize routing, retries, and fallback behavior around our own backend. The tradeoff is maintenance. Once you need dashboards, alerts, cost reports, debugging tools, and team visibility, it starts becoming another service your team has to operate. Strong option, but probably better for teams with enough engineering bandwidth
• Helicone Helicone felt more like an observability layer than a full routing solution. That is still useful for our case cuz a lot of the pain is not just picking the right model, but understanding why some NPC replies are slow, expensive, or weirdly inconsistent. I liked it for request logs, latency, token usage, and cost visibility. But if we need gameplay-driven routing, provider fallback, or more complex model selection logic, we would probably still need another layer around it
• Portkey Portkey felt powerful, but heavier than what we need right now. The caching and fallback features are attractive, especially for repeated townsfolk greetings or avoiding dead NPC replies when a provider stalls. But the setup felt like it could become its own project. Our backend is already dealing with player state, quest flags, character memory, prompt routing, and safety checks. portkey might make sense later if the system gets bigger, but for now it feels a little too heavy for our current stage.
• ZenMux Zenmux felt practical for a production setup without forcing a huge backend change. It gave us enough visibility to debug slow or weird NPC replies, especially when we needed to see the model path, latency, retry behavior, and cost in one place.
The tradeoff is that it is not the most customizable option. LiteLLM still feels better if your team wants full control and does not mind maintaining more infra. But for high-volume NPC dialogue, the lower-friction setup and clearer cost tracking made zenmux worth testing seriously. TL;DR: Openrouter felt best for quick model testing, Litellm for teams that want more control, Helicone for observability, Portkey for heavier production workflows, and Zenmux for a lower-friction production setup. For NPC dialogue, the main things we care about are latency, fallback behavior, and cost tracking, because every random player interaction can become a model call.