u/BlacksmithRadiant322 — reddlx

I’m done paying for LLMs until they learn token efficiency

Watching my agent spin in circles, re-explaining the same steps over and over before maybe doing something useful. I keep telling it use “/caveman full” skill — short, direct, no fluff — and it just ignores me. More verbose walls of text. More wasted tokens.

Why aren’t these models trained for agentic efficiency? Punish every unnecessary token. Reward getting the job done fast. Until then, I’m not paying to watch an LLM burn money while pretending to think.

reddit.com

u/BlacksmithRadiant322 — 7 days ago

▲ 2 r/AgentsOfAI

Is there a way to benchmark tokens/sec for the same model across providers?

I’m trying to compare throughput (tokens/sec) for the same model (e.g., DeepSeek V4 Flash) on different providers without having to manually test each one myself.

reddit.com

u/BlacksmithRadiant322 — 9 days ago

▲ 1 r/opencode

How can I configure opencode to follow certain rules?

I'd like to give a set of rules like these for it to try to follow those rules when refactoring code. Also a rule to always commit using conventional commits after meaningful changes.

reddit.com

u/BlacksmithRadiant322 — 14 days ago

▲ 2 r/hermesagent

reddit.com

u/BlacksmithRadiant322 — 15 days ago

▲ 1 r/hermesagent

I tried ollama/qwen3.5:0.8b but every time it pastes markdown, never accomplishes the tasks given.

reddit.com

u/BlacksmithRadiant322 — 15 days ago

▲ 1 r/opencode

I was using OpenCode with OpenCode's Zen MiniMax M2.5 Free and hit a rate limit. Switched to OpenRouter’s Gemma 4 31B free (different provider entirely), but I’m still seeing the same rate limit message.

That makes me think it’s not the upstream API but OpenCode itself clamping down. Does OpenCode have its own global rate limits per user/IP? Or could something else be cached/carrying over?

Anyone else run into this?

reddit.com

u/BlacksmithRadiant322 — 26 days ago