r/openrouter

Please give me your ranking for roleplaying models

1- Claude (sonnet or opus): my top favorite.

2- Cydonia (good at following instructions)

.

Now, the trash ones

1- Hermes: Lacks creativity, only write what you want but in a beautiful style. Wish if it assists with more brain

2- Gemeni: Extremely rigid. Forget your unique personalties writing very fast. Lean into classic troupes like if it is religion

3- GPT: it looks good at start but after many trials you realize it is one trick pony.

Mind you, I experiment with paramters changing and custom instructions a lot

Edit: Forget to mention

Deepseek models: Trash

GLM models: Trash

Mistral: Trash

All trash models are not gonna look like trash from first use. In fact, it may look like you hit the jackpot

Forgive my french

reddit.com
u/Damn_You_Hindsight — 20 hours ago

Performance compared to first party providers

Hello everyone. For a bit of context, I have Tier 5 accounts with OpenAI andAnthropic API's. I just signed up for a paid OpenRouter account because I am seeking cheaper but capable models for simpler tasks. I also have several machines in my home lab running the open weight ~27b-~120b models across various nodes.

I have configured three different agents, each backed by OpenRouter and grok, deepseek v4 pro, and Kimi k2.6

It seems to take a while to begin receiving streamed responses, even the thought summaries. I have a complex agent setup where large inputs+context is sent and the first party API providers breeze through this.

I just started testing all of this about an hour ago, but I am wondering if this is the typical experience? For those who use first party providers and also OpenRouter, should I expect this kind of latency? I am also attempting to run multiple agents backed by different models and it seems like my requests are being queued, so even if I have different sessions backed by different models, only one is running?

I admit I havent delved deep into the documentation. But so far the experience is not good. It works. But the latency and performance leaves a lot to be desired vs my experience with first party OpenAI, Anthropic and Google API's, as well as my own locally hosted models.

reddit.com
u/LocoMod — 22 hours ago

Help

Dose anyone know how to fix the text messing up words cause I’m using deepseek but for some reason the text messing up is there like anyway to fix this man

u/GodofSinsJay — 22 hours ago

What are some good Open Router alternatives?

Recently started exploring beyond the #1 in the ai api space. Interested in others experiences!

reddit.com
u/FiLo420blazeit — 2 days ago

$18 to $2 on the same agent run by not using opus for every step

Ran a 180 tool call browse and summarize agent on a long paper last weekend. Opus 4.7 for everything: about $18. After I routed the routine steps like search loops, summarization, and basic parsing to deepseek v4 and hunyuan hy3 preview while keeping opus only for the final synthesis pass, the total dropped to roughly $2. I was expecting noticeable quality loss on the routine parts but honestly could not tell the difference. OpenRouter rankings had the latter at #1 by tool call volume after its launch and in my mcp setup it handled every call cleanly. The hybrid reasoning modes helped too since I could keep most steps in no_think mode to save tokens.

Still reach for opus when debugging across unfamiliar codebases or anything that needs deep architectural reasoning, the cheaper models noticeably stumble there. But for the 80% of steps that are just calling a tool and parsing a result, paying frontier rates is hard to justify. For context the tencent cloud pricing on the model is about $0.18 per million input tokens.

reddit.com
u/Jazzlike_Process_202 — 2 days ago

When I use Gemma 4 31b it free, it keeps giving the same error, no matter how many times I reroll, how much time I waited.

PROXY ERROR 42#9: {"error":{"message":"Provider returned error","code":"metadata":{"raw":"google/gemma-4-31b-it:free is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations","provider\_name":"Google AI Studio","is_byok":false}},"user_id":"user_33Kn3gcpPEK3PzoHvylvfK4ObsI"} (unk)

This thing. It's been almost 2 months. It's happened so much. How can I solve it? What should I do?

u/Dalron_Stinger — 3 days ago
▲ 0 r/openrouter+1 crossposts

Prepare to spread your wallets cheeks from $155.90 to $534.72

June 1st is just around the corner... be ready to open your wallet and spend more....

https://preview.redd.it/mbkzp34fwt1h1.png?width=1464&format=png&auto=webp&s=2502987125e846fbd3556c3511f71ae97107563a

To give more context this bill is just spent developing my SaaS LaundromatAI Smart AI Laundry POS & Management https://laundromatai.app not promoting here just only to give context to how the upcoming bill if you launch your own saas. OMG

reddit.com
u/One-Gas-74 — 4 days ago
▲ 67 r/openrouter+1 crossposts

PSA for OpenRouter users

This might be common knowledge. But if you use openrouter there are a few things that can really drive up your token usage that you might not consider.

  1. Provider switching kills input caching. Every time openrouter switches you to a different provider even on the same model it wipes your cache and your whole context is reloaded. I have seen it switch providers every few prompts. Once I locked it down to a specific provider per model my cache got much more stable. 99% caching every time.

  2. Multiple agents with the same API key going to the same model and same provider can also reset your cache. Every prompt from a different agent looks like it's from the same source so it's blows away the cache. I give each profile/agent Thier own API key now.

  3. Not all providers cache equally. I noticed when I was routed to certain providers I got almost no cache credit while other caches almost everything.

Given cached input is 90% or more cheaper, making these changes cut my token consumption down dramatically. Now I don't worry about a 150k context length becuase 149,850 of it is cached.

reddit.com
u/EconomyPhotograph927 — 6 days ago

Things I figured out about OpenRouter after building on it for 6+ months (that the docs don't really explain)

Spent a lot of time digging into OpenRouter's behavior in production. Sharing what actually surprised me in case it saves someone else the debugging time.

1. The "free" models have wildly different rate limits — and they're not documented per-model

The free tier isn't uniform. Some models (like older Mistral ones) will four-two-nine you almost immediately under any real load. Others are surprisingly tolerant. The only reliable way to know is to test. If you're building anything beyond personal use, treat free models as prototypes only.

2. Context window ≠ usable context window

Some providers serving the same model through OR will silently truncate your context rather than returning an error. You'll get a response, it'll look fine, and your earlier conversation will just be... gone. If your app is context-sensitive, log your token counts and compare what you sent vs what the model appears to have seen.

3. Provider fallback order matters more than the model choice

For the same model slug, different providers have different p99 latency profiles. The default routing picks on price + availability, but if you need consistent latency, pin your provider explicitly with provider.order. The difference between best and worst provider for the same model can be 3-4x on tail latency.

4. Streaming + tool use has edge cases that'll bite you

If you're using function calling / tool use with streaming enabled, some providers will buffer the entire tool response before streaming starts anyway. Others will stream partial JSON. Write your parser defensively — don't assume clean chunk boundaries.

5. The /models endpoint is your best friend for cost optimization

Pull it programmatically, not just for browsing. Model pricing changes more often than you'd expect. If you're doing high-volume work, a monthly diff on pricing can reveal meaningful savings opportunities, especially as new cheaper models ship.

6. X-Title header actually affects your usage analytics

Pass it. If you're running multiple apps or experiments through the same OR key, the title header lets you filter your usage dashboard cleanly. Takes 30 seconds to add and makes debugging much easier when you're trying to trace a cost spike.

Bonus: for teams outside the US/EU

OR requires a credit card for paid usage, which is a blocker for a lot of developers in regions where that's not easy. Worth knowing before you build a dependency on it.

What's been your biggest "I wish I knew this earlier" moment with OpenRouter? Especially curious if anyone has tips on provider pinning strategy.

reddit.com
u/FiLo420blazeit — 6 days ago

I did everything even asked GPT and others AI's but nothing working, please help

The model is openrouter/owl-alpha

u/jackmaxs20 — 5 days ago

Openrouter routing Deepseek V4 via NovitaAI even at cheapest-first setting

Deepseek offers their model via openrouter at near 1/10th the cost, but even with all settings to allow Deepseek as provider (data usage concerns setting), it still chooses NovitaAI. Now at a 10% premium or whatever this would be whatever behaviour, NovitaAI is slightly faster, but not by that much.
But at a 10x premium? Why would it even use anything but Deepseek as provider?

Fixed it by using a whitelist and only allowing deepseek as provider, suddenly I was losing cents per hour instead of dollars. This shouldn't happen in my opinion.

reddit.com
u/sdziscool — 6 days ago

Deeseek R1 0528

hi all! i have $10 credits and would like to set up the titled model for my rp on jai. I’ve recently cancelled a subscription to another platform (sounds like shoots) and hope to be guided in the right path here.

reddit.com
u/Embarrassed-Refuse74 — 6 days ago
▲ 1 r/openrouter+1 crossposts

Decrease the token count as the model reply slowly

Hi I started using openclaw. Had multiple issues with it with gateway, models and setup.

It is now working but the main issue is of the token count.

I am using gpt oss 120 B and facing the issue of slow reply.

I am using openrouter api and the model is in itself free so i know that it might be slow.
To get this Straight every small task dumps all the files in context I know that I just want to know how I can decrease the token count.

It sends nearly 18K tokens per input and the token/sec output is sometimes 2-4 token/sec.

It has gone to nearly 10 to 20 sometimes but mostly slow.

How can I reduce it. Help guys!!!

reddit.com
u/Grand_Competition_99 — 7 days ago

Free Model Suggestion: Ai-asssistant

I want some free models suggestion that can do complex tasks on computer because I am building an A.i assistant that can do work, so I wanna test it.. Please recoomend any models from OpenRouter that are free.

reddit.com
u/Smooth_Carob_5054 — 8 days ago
▲ 9 r/openrouter+1 crossposts

Are DeepSeek models on OpenRouter via NovitaAI and SiliconFlow the same quality as official DeepSeek?

I have been using DeepSeek V4 for coding tasks through both OpenRouter and the official DeepSeek platform. Today, I checked the OpenRouter logs and noticed that the provider for the DeepSeek model was not DeepSeek itself, but NovitaAI and SiliconFlow instead.

Now I am wondering whether these providers deliver the same quality as the original DeepSeek service or if the quality is degraded in some way.

If the quality is identical or even slightly worse, I feel like I might stop using OpenRouter and just use DeepSeek directly instead. After all, DeepSeek is the company that actually created the model, while other providers are essentially hosting it and making money from it. I would rather have that revenue go directly to DeepSeek so their team has more resources to continue improving the model.

What do you guys think?

reddit.com
u/Existing_Arrival_702 — 9 days ago