u/LocoMod

Hello everyone. For a bit of context, I have Tier 5 accounts with OpenAI andAnthropic API's. I just signed up for a paid OpenRouter account because I am seeking cheaper but capable models for simpler tasks. I also have several machines in my home lab running the open weight ~27b-~120b models across various nodes.

I have configured three different agents, each backed by OpenRouter and grok, deepseek v4 pro, and Kimi k2.6

It seems to take a while to begin receiving streamed responses, even the thought summaries. I have a complex agent setup where large inputs+context is sent and the first party API providers breeze through this.

I just started testing all of this about an hour ago, but I am wondering if this is the typical experience? For those who use first party providers and also OpenRouter, should I expect this kind of latency? I am also attempting to run multiple agents backed by different models and it seems like my requests are being queued, so even if I have different sessions backed by different models, only one is running?

I admit I havent delved deep into the documentation. But so far the experience is not good. It works. But the latency and performance leaves a lot to be desired vs my experience with first party OpenAI, Anthropic and Google API's, as well as my own locally hosted models.

Performance compared to first party providers