r/kilocode | reddlx

Is it possible to use a model, but block certain providers of that model eg. providers in foreign countries, providers that log / train on inputs ?

u/jordan_be — 20 hours ago

I built a "pantheon" of 7 specialist AI agents for KiloCode

Built oh-my-kilocode-slim — a KiloCode plugin that routes each coding task to whichever model does it best/cheapest.

The problem: I was burning tokens running Opus for "find the auth file" and Haiku for "design a system migration." One model per session is wasteful.

The setup — 7 specialist agents:

Chief — orchestrator. Talks to you, plans, delegates

Explorer — fast codebase recon (cheap model, e.g. Haiku/GPT-4-mini)

Librarian — live docs/library research (web-fetching)

Oracle — architecture review, complex debugging (Opus-tier, used sparingly)

Designer — UI/UX work

Fixer — bounded code edits (Sonnet-class, high volume)

Council — multi-LLM synthesis for hard decisions

What it actually does:

You talk to Chief, Chief dispatches background tasks

Each agent runs in its own tmux/Zellij pane — parallel by default

Custom preset: you pick which model backs each agent

Cost optimization comes from routing: research→mini, review→opus, edit→sonnet

Strong type-safety (Zod schemas), Bun + TypeScript, ESM

Real numbers from my week:

Same refactor task, before: 1 Opus session, ~$4.20, 18 min With plugin: 1 Opus (oracle review) + 3 Sonnet (fixer edits) + 2 Haiku (explore) = ~$1.10, 11 min

Install:

bunx @emngny/oh-my-kilocode-slim@latest install

Repo: github.com/emngny/oh-my-kilocode-slim

License: MIT, v2.2.0, active development

Curious: anyone else experimenting with per-task model routing? What's your split?

reddit.com

u/cikibik — 1 day ago

▲ 187 r/kilocode+2 crossposts

How to set up DeepSeek Flash + GLM 5.2 advisor in OpenCode - the exact config

A few people asked how to set up the DS Flash + GLM 5.2 combo from my last post. Here's the exact config.

The idea: DeepSeek Flash handles the routine orchestration (cheap, fast, 1M context). GLM 5.2 steps in as an advisor subagent when the task needs actual reasoning. Flash pays ~$0.0003 per mechanical call. GLM only burns credits on the calls that need it.

The config

Add this to your opencode.jsonc. Any of the three locations work:

~/.config/opencode/opencode.jsonc           # global, all projects
~/.opencode/opencode.jsonc                   # project-level
.opencode/opencode.jsonc                     # per-repo, can commit

{
  "$schema": "https://opencode.ai/config.json",
  "default_agent": "deepseek-flash",
  "agent": {
    "deepseek-flash": {
      "description": "Primary agent. Fast, cheap orchestration for routine engineering work.",
      "mode": "primary",
      "model": "opencode-go/deepseek-v4-flash",
      "steps": 30
    },
    "glm-advisor": {
      "description": "Strategic advisor for second opinions, plan critique, and architecture tradeoffs.",
      "mode": "subagent",
      "hidden": true,
      "model": "opencode-go/glm-5.2",
      "steps": 15,
      "temperature": 0.3,
      "permission": {
        "read": "allow",
        "glob": "allow",
        "grep": "allow",
        "list": "allow",
        "webfetch": "allow",
        "edit": "deny",
        "write": "deny",
        "bash": "deny",
        "task": "deny",
        "question": "allow",
        "todowrite": "deny"
      }
    }
  },
  "provider": {
    "opencode-go": {
      "apiKey": "{env:OPENROUTER_API_KEY}"
    }
  }
}

How it works

Set default_agent to deepseek-flash. Flash handles every session, cheap and fast. When you hit a task that needs judgment -- architecture decision, plan critique, second opinion -- tell Flash to dispatch the glm-advisor subagent via the task tool.

Flash's system prompt already knows how to route:

Bounded mechanical work (classify, edit JSON, summarize): handles itself
Strategic work (tradeoffs, plan review, second opinions): dispatches to glm-advisor

Prompt for the advisor subagent (optional)

If you want the advisor to follow a consistent output format, save this as .opencode/prompts/glm-advisor.md:

You are a sharp, honest senior advisor. All context is inline in the prompt below.
Never reference files, external sources, or prior conversations.

Structure every response in three sections:
1. CONCLUSION -- your direct answer or recommendation in 1-3 sentences.
2. REASONING -- the key factors, evidence, or logic behind your conclusion.
3. WATCH OUT -- caveats, failure modes, or what may have been missed.

Be direct. If the question has no good answer, say so and explain why.
Do not hedge unnecessarily. Calibrate confidence honestly.

Then reference it in the config by adding to the glm-advisor block:

"prompt": "{file:.opencode/prompts/glm-advisor.md}"

What changes

Before: GLM 5.2 running every call. Burning through opencode-go quota on routine work like "list files" and "run tests."

After: Flash handles everything routine. GLM only fires when you (or Flash) decide analysis is needed. In my usage, roughly 70-80% of calls stay on Flash. The other 20-30% use GLM, and those are the ones that actually needed it.

Why this split

Flash is $0.14/M input, GLM is $1.40/M. For classification and formatting work under 2K tokens, Flash costs about $0.0003 per call. GLM on the same task costs an order of magnitude more for no meaningful quality difference.

The advisor subagent pattern keeps GLM in reserve for the work it actually improves: multi-factor analysis, architecture judgment, and second opinions. Everything else stays on Flash.

Caveats

Flash and GLM both have 1M context. Context length is not a differentiator between them.
GLM's reasoning mode (effort=max) takes 60-120 seconds. Budget for it if you call it synchronously.
The advisor is read-only. It can read files, search, and fetch URLs, but cannot edit files, write code, or run commands. Output lands in the primary agent's context for review.

reddit.com

u/ahriad — 4 days ago

▲ 8 r/kilocode+2 crossposts

I built `/steal` — a Cursor slash command that pulls in your latest Kilo Code / GLM 5.2 chat and vice versa

Cursor doesn't ship GLM 5.2 (or any Fireworks models), so a lot of us use Cursor

for Opus 4.8 and something like Kilo Code + Fireworks for GLM. Great — until you

want to move between them mid-task and end up re-explaining the whole context.

I wrote a tiny MIT tool that installs a `/steal` slash command in both editors.

In Cursor, `/steal` pulls in your most recent Kilo session for the current

project; in Kilo, `/steal` pulls in your most recent Cursor session. Direction is

baked in — no arguments to remember. About 50 ms per call because it reads each

tool's session store directly (Cursor's JSONL under

`~/.cursor/projects/<slug>/agent-transcripts/`, Kilo's SQLite under

`~/.local/share/kilo/kilo.db`) instead of scanning your whole history.

Install:

npm i -g steal-context

cd your-project

steal-context init

Then `/steal` in either tool.

Everything is local, read-only, MIT-licensed. Default handoff is 40 messages

sized for Opus/GLM's context windows, adjustable.

Repo: https://github.com/alonsorobots/steal-context

Curious if others hit the same workflow gap and if there are other Cursor pairings

worth supporting fast (Cline, Claude Code, Codex, etc.).

u/alonsorobots — 4 days ago

▲ 18 r/kilocode

Next-Edit just landed in Kilo and it's free for the next month

If you're on Kilo, Next-Edit is live and free for everyone through July 23. No trial, no card, it's on via the Kilo Gateway (the default).

It's powered by Mercury Edit 2, Inception's diffusion model. Instead of just completing ahead of your cursor, it looks at your recent edits and predicts your next change anywhere in the file, like finishing a refactor or propagating a rename. Hit Tab to accept, and because it's diffusion-based the suggestion comes back fast.

It's the new default for new users. If you'd already set an autocomplete default, switch to Next-Edit manually under Settings.

Disclosure: I work at Inception. Keen for feedback from anyone running it in Kilo, especially where it over-suggests or misses. How's it been so far?

reddit.com

u/apoorvumang — 7 days ago

▲ 2 r/kilocode

Kilo Code doesn't recognize Workspaces

If you have a multi root workspace project in vs code. Kilo code constantly fails to find files. Even files you've shown it already. It says no such file exists when it's clearly in the workspace. Even if you give it the path it says no such file exists. Is there a way I can enable workspaces for it?

reddit.com

u/mfaine — 7 days ago

▲ 20 r/kilocode

Does Kilo Gateway support DeepSeek’s automatic Context Caching?

Hey everyone, I just ran a test comparing identical workloads on the exact same day directly through the DeepSeek API versus using the Kilo Gateway, and the price difference is shocking.

Direct DeepSeek API: Cost $0.64 for the day. (DeepSeek’s automatic Context Caching kicked in, giving me nearly a 90% cache hit rate on my long prompts/code files).
Kilo Gateway: Cost $2.34 for the exact same day.

This proves that Context Caching is completely missing or bypassed when routing queries through Kilo. You are paying full price for prompt inputs on every single request, which makes it nearly 4x more expensive for context-heavy tasks (like coding with large files open).

Recommendation: If you are doing heavy, context-repetitive tasks, stop using Kilo for now and switch to a direct DeepSeek API key. Otherwise, you are literally throwing money away on unsaved cache. Check your dashboards!

Does Kilo Gateway support DeepSeek’s automatic Context Caching, or does it bypass it?

reddit.com

u/MultiBotRun — 7 days ago

▲ 25 r/kilocode

What made you choose Kilo Code over OpenCode, Claude Code, Codex, etc.?

I've recently started using Kilo Code, and my first impression is that it feels more polished and thoughtfully designed than some of the other coding agents I've tried.

That said, I'm still new to it, so I don't really know what makes it stand out under the hood.

I'm curious to hear from people who have experience with multiple coding agents, whether that's OpenCode, Claude Code, Codex, Cline, Roo Code, Cursor, Windsurf, or anything else.

Why did you settle on Kilo Code ?

I'm especially interested in hearing from people who have used several of these tools extensively and still ended up preferring Kilo Code. What made you stick with it, and are there any areas where you think it genuinely does better than the alternatives?

Looking forward to hearing your experiences.

reddit.com

u/cryptoman_101 — 9 days ago

▲ 23 r/kilocode+1 crossposts

GLM 5.2 High is as good as GPT 5.5

https://preview.redd.it/opzzeoy9a39h1.png?width=1120&format=png&auto=webp&s=d1dabe7476f872a0e57b01efd07d6fb3f20cc7f3

Hello, I’ve given GLM 5.2 a big project of mine with multiple agents working on it together. It’s been pretty good! It has been running for over 24 hours and the results are surprisingly good. It had some hiccups, but the code and the project work fine, and the pricing is absolutely amazing. Feels like GPT-5.5 level for half a million tokens at around $46, which is also an absolute bargain. What do you guys think about GLM 5.2?

reddit.com

u/Brilliant_Throat_448 — 12 days ago

▲ 4 r/kilocode

Notifications not working?

Hey guys, I've recently move from version 5.x to latest version and it seems the notifications no longer works.

I saw the issue in the GitHub and it seems the PR related to ir has been merged but until now I've see no notification from my vscode. Anyone know how to fix it?

reddit.com

u/luckypanda95 — 12 days ago