r/kilocode

▲ 51 r/kilocode+3 crossposts

recently a lot of LLM providers are starting to limit the coding plan with limited quota, and i wanted to know how much would i need to spend on API keys if i would use the same usage on daily/monthly basis.

i decided to make this extension to visualize all my token usage based on actual usage from kilocode, so i can model the estimated token usage cost correctly if I were to use the API keys.

i also added a statusbar to show the z.ai LLM provider quota usage before it resets

u/timx88 — 1 day ago

Dumb Question: is there any phone support for Kilo Code ?

Long story short I have Kilo Extension on my laptop and I love it. Been using it non stop since December 2025.

I just wish I could continue promoting and working when I’m not home and away from my laptop.

reddit.com
u/Cocoa_Pug — 1 day ago

Kilo Code refuses to respect context size

Hi,

I've been usin Roo Code until recently it was discontinued, so I've switched to Kilo Code and with Google's AI I've been trying for almost two days to get it working correctly with no success, it just keeps overflowing my model's context size.
Note I was using Roo Code with LM Studio, but then switched to llama.cpp.
This is my `llama-qwen.zsh` script for launching `llama-server`:

\#!/usr/bin/zsh

cd /home/user/bin/



\# Load aliases and clean system caches

setopt aliases

source \~/.zshrc

clearcache



\# Function to reclaim RAM disk space

cleanup() {  
echo "\\n\[System\] Cleaning up RAM cache at /dev/shm/llama\_cache..."  
rm -rf /dev/shm/llama\_cache  
}



\# Trap EXIT (script finish), INT (Ctrl+C), and TERM (kill)  
trap cleanup EXIT INT TERM



\# Create fresh RAM cache directory  
mkdir -p /dev/shm/llama\_cache



echo "\[System\] Starting llama-server with RAM cache..."



llama-server \\  
  \--slot-save-path /dev/shm/llama\_cache \\  
  \-m "/home/user/.lmstudio/models/DuoNeural/Qwen3.6-35B-A3B-Code-imatrix-GGUF/qwen36\_35b\_Q5\_K\_M.gguf" \\  
  \--n-gpu-layers 41 \\  
  \--n-cpu-moe 31 \\
  \--ctx-size 24576 \\
  \--parallel 1 \\
  \--flash-attn on \\
  \--cache-type-k q8\_0 \\
  \--cache-type-v q8\_0 \\
  \--threads 4 \\
  \--threads-batch 4 \\
  \--split-mode none \\
  \--batch-size 2048 \\
  \--ubatch-size 512 \\
  \--mlock \\
  \--reasoning on \\
  \--chat-template-kwargs '{"preserve\_thinking": true}' \\
  \--host [0.0.0.0](http://0.0.0.0) \\
  \--port 8080 \\
  \--temp 0.3 \\
  \--top-k 40 \\
  \--top-p 0.9 \\
  \--min-p 0.08 \\
  \--repeat-penalty 1.1 \\
  \--repeat-last-n 64 \\
  \--cache-prompt \\
  \--n-predict -1

When I was using Roo Code it was condensing context often but got the work done, now it shows red text box inside its gui extension in VSCode saying:

request (44775 tokens) exceeds the available context size (32768 tokens), try increasing it
{
  "name": "ContextOverflowError",
  "data": {
    "message": "request (44775 tokens) exceeds the available context size (32768 tokens), try increasing it",
    "responseBody": "{\"error\":{\"code\":400,\"message\":\"request (44775 tokens) exceeds the available context size (32768 tokens), try increasing it\",\"type\":\"exceed_context_size_error\",\"n_prompt_tokens\":44775,\"n_ctx\":32768}}"
  }
}

My kilo.jsonc is like this:

{
  "$schema": "https://kilo.ai",
  "model": "llama-cpp/qwen3.6-35b-a3b",
  "small_model": "llama-cpp/qwen3.6-35b-a3b",
  "agent": {
    "concurrency": {
      "limit": 1
    },
    "limit": {
      "context": 32768,
      "input": 28000,
      "output": 4096
    },
    "plan": {
      "model": "llama-cpp/qwen3.6-35b-a3b"
    },
    "debug": {
      "model": "llama-cpp/qwen3.6-35b-a3b"
    },
    "orchestrator": {
      "model": "llama-cpp/qwen3.6-35b-a3b"
    },
    "ask": {
      "model": "llama-cpp/qwen3.6-35b-a3b"
    },
    "code": {
      "model": "llama-cpp/qwen3.6-35b-a3b"
    }
  },
  "provider": {
    "llama-cpp": {
      "name": "Local Qwen3.6-35b-a3b",
      "npm": "@ai-sdk/openai-compatible",
      "options": {
        "baseURL": "http://localhost:8080/v1"
      },
      "models": {
        "qwen3.6-35b-a3b": {
          "name": "Qwen3.6 35b A3B",
          "context_window": 32768,
          "max_input_tokens": 22000,
          "reasoning": true,
          "variants": {
            "thinking": {
              "enable_thinking": true,
              "chat_template_args": {
                "enable_thinking": true
              }
            }
          }
        }
      }
    }
  },
  "instructions": [
    "/home/user/proj/kilocode/INSTRUCTIONS.md"
  ],
  "permission": {
    "bash": "allow"
  }
}

llama-server output:

...
reasoning-budget: deactivated (natural end)
slot print_timing: id  0 | task 2 | 
prompt eval time =   72230.52 ms / 13675 tokens (    5.28 ms per token,   189.32 tokens per second)
       eval time =   10275.10 ms /   157 tokens (   65.45 ms per token,    15.28 tokens per second)
      total time =   82505.63 ms / 13832 tokens
slot      release: id  0 | task 2 | stop processing: n_tokens = 13831, truncated = 0
srv  update_slots: all slots are idle
srv  params_from_: Chat format: peg-native
slot get_availabl: id  0 | task -1 | selected slot by LCP similarity, sim_best = 0.440 (> 0.100 thold), f_keep = 1.000
reasoning-budget: activated, budget=2147483647 tokens
slot launch_slot_: id  0 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> ?top-p -> min-p -> ?xtc -> temp-ext -> dist 
slot launch_slot_: id  0 | task 484 | processing task, is_child = 0
slot update_slots: id  0 | task 484 | new prompt, n_ctx_slot = 24576, n_keep = 0, task.n_tokens = 31422
srv    send_error: task id = 484, error: request (31422 tokens) exceeds the available context size (24576 tokens), try increasing it
slot      release: id  0 | task 484 | stop processing: n_tokens = 13831, truncated = 0
srv          stop: cancel task, id_task = 484
srv  update_slots: no tokens to decode
srv  update_slots: all slots are idle
srv  log_server_r: done request: POST /v1/chat/completions 127.0.0.1 400
srv  params_from_: Chat format: peg-native
slot get_availabl: id  0 | task -1 | selected slot by LRU, t_last = 148257259339
srv  get_availabl: updating prompt cache
srv   prompt_save:  - saving prompt with length 13831, total state size = 206.587 MiB
srv          load:  - looking for better prompt, base f_keep = 0.000, sim = 0.002
srv          load:  - found better prompt with f_keep = 0.426, sim = 0.331
srv        update:  - cache state: 1 prompts, 395.027 MiB (limits: 8192.000 MiB, 24576 tokens, 286824 est)
srv        update:    - prompt 0x55d63dd3c310:   13831 tokens, checkpoints:  3,   395.027 MiB
srv  get_availabl: prompt cache update took 241.41 ms
...

Is there any fix I could try, or should I switch to Cline until something is fixed?

Thanks in advance.

reddit.com
u/Lower-Ad6101 — 3 days ago

Any way to avoid this death loops?

It's llama cpp fault? Kilo don't detect death loops? What is happening? I'm tired, Boss....

u/Special-Lawyer-7253 — 3 days ago

Which is the best free model?

Hello,
I have been using Antigravity since it was launched and the Opus 4.5 that time was nearly unlimited, after a while they decreased the usage.

So I used Opus to plan and Gemini to execute however now the same thing with Gemini 3.1 pro as well. So I am looking for alternatives,

I have CoPilot via student email however they have limited access to claude models as well and the Auto mode (which gives Gpt 5.4 mini mostly) is awful earlier they had Codex 5.3 however they removed it as well.

Not getting any decent model at all which can execute the plans generated by Opus.

Kilocode fortunately has multiple free models some have been running for long time some comes time to time, from the list which would by far be the smartest one for complex reasoning need suggestions

u/Level-Dig-4807 — 8 days ago
▲ 7 r/kilocode+2 crossposts

Spec vs. Sanity: Is Spec-Driven Development actually a productivity trap?

I’ve been trying to be a "good dev" lately by strictly following Spec-Driven Development (SDD). The theory is great: define everything upfront, reduce ambiguity, and then just execute.

But in practice? It’s making me incredibly slow, and honestly, the results are worse.

Here is what’s happening:

  • The Overhead**:** I spend so much time defining every edge case that by the time I start coding, I’m already mentally exhausted.
  • The "Big Bang" Failure**:** Because the specs are so detailed, the resulting implementation becomes this massive, monolithic PR. When I finally run it, it’s a nightmare to debug.
  • Missing the Flow: When I work with short, scoped-down implementations (the "build and iterate" approach), I catch errors early and the code feels much cleaner.

With SDD, I feel like I’m building a giant puzzle in the dark, only to find out at the end that half the pieces don't even fit the original frame.

Is anyone else feeling this? Have we over-corrected on "planning" to the point where we’ve lost the benefits of iterative development? Or am I just doing SDD wrong?

I'd love to hear how you guys balance deep technical specs with the need to actually keep things lean and bug-free.

reddit.com
u/Tricky_Cartoonist989 — 8 days ago

Feature preview: REVIEW.md support for Kilo Code Reviews

We just shipped an early preview of `REVIEW.md` support for Kilo Code Reviews.

You can now add a `REVIEW.md` file to the root of your repository to give Kilo custom review guidance for that repo. This lets you define the standards, conventions, risk areas, architectural patterns, and team-specific expectations that Kilo should consider when reviewing pull requests.

Why this is useful:

- More control over how Kilo reviews your code

- Repo-specific guidance instead of one-size-fits-all feedback

- Better alignment with your team’s engineering standards

- A simple way to call out areas Kilo should pay extra attention to

- Easier customization without changing every review prompt manually

For now, this is a feature preview. Organizations and individuals need to enable it manually by turning on `Use REVIEW.md` in:

https://app.kilo.ai/code-reviews

Next week, we’re planning to enable this by default for all organizations. You’ll still need to add a `REVIEW.md` file to your repository for Kilo to use it.

We’ll share a bigger announcement and more examples soon, but if you want to try it early, enable `Use REVIEW.md`, add a `REVIEW.md` file to your repo, and let Kilo start reviewing with your team’s own context.

u/Marian____ — 7 days ago

You're losing customers

The new update is horrible compared to how the architecture mode worked. I know there are kilo mods on here, but clearly they don't have any pull at the actual company. People keep speaking and they aren't listening. I already have a few people who are switching because of this.

reddit.com
u/MomentImmortalizer — 9 days ago

What can the current v7 Kilo Code offer compared to Claude Code, Kimi Code, OpenCode, Hermes, Codex?

Hello!

Just heard about all the backlash from the community after the new version.

I've reached a point where I'm using mostly Claude, Kimi, Codex and Hermes for work-related tasks.

My question is: what makes Kimi Code, especially the CLI, different or better than the other alternatives?

Not asking about generic stuff, what are the serious limitations addressed.

What can it do that others can't? What are the current rough edges after the final launch?

In which scenarios would you recommend Kilo Code/Kilo Code CLI over Claude Code or Codex now?

And please, don't treat this as a generic newbie question.

This question is deceiving and purposefully catering to the seasoned eye.

Thanks.

reddit.com
u/gglavida — 9 days ago

kilo with local ollama

I'm trying to get kilo to communicate with my local ollama server, but not having any luck.

I've tried:

I've not yet been successful connecting to my local ollama.

Are the docs up to date? Is anyone running kilo with a local ollama as of today? What config file does work?

reddit.com
u/synth_alice — 7 days ago
▲ 14 r/kilocode+1 crossposts

What's your opinion on Kilo pass and its experience?

Hi all, I'm a scientific programmer and have been using free models and openrouter paid models for some time. My main paid model is gpt-5.5 for planning and deepseek v4 pro for implementation. My token usage is about ~$20 with the heavily discounted deepseek v4 pro model. DS's honeymoon price comes to an end on May 31st, so I'm preparing to shift to a more affordable coding plan. I mainly use opencode as my coding agent, and have a list of my own curated skills.

The pricing model for kilo pass is quite attractive, so I wonder if people are using it? What's your experience? Do you have any other suggestions for more affordable alternatives?

Thanks!

P.S. I did some research on Kilo pass and found it does have limitation on where I can use it. So basically it is openrouter + bonus if I route all my requests through the Kilo gateway?

reddit.com
u/NeedHealingForFun — 10 days ago

Issue: Persistent Python Indentation Errors with Kilocode (Cursor + Claude Opus)

I would like to report an issue. I am constantly running into indentation errors when modifying and writing Python code using Kilocode. I've tried various methods to fix it, but nothing works—even extremely simple code blocks trigger these errors.

For context, my current setup is:

  • Editor: Cursor
  • Tool/Extension: Kilocode
  • Model: Claude Opus

I feel like this wasn't a problem before the recent Kilocode update; everything used to work perfectly fine.

Has anyone else encountered this kind of issue recently? Any help or insights would be greatly appreciated!

reddit.com
u/Future-Location-8481 — 8 days ago

Please add back orchestrator mode

Please add back orchestrator mode. Orchestrator made things so easy . I just described what i needed to do and was switching modes to get more data, plan de bug ask code etc.
I moved to roocode when orchestrator was removed , but now roocode is stoping deleopment. I will probbaly stay couple more months with last version of roocode and then i will see where i go. Between claude code and kilocode without orchestrator , i will probably go with claude code (+glm pro)

reddit.com
u/Antique_Archer_7110 — 11 days ago

Guys I need your honest opinion.

I was using Roo code for 3 months and now I have to move since the future of Roo code is not very clear for me. I want to make sure that I'm moving to the correct tool. Do you advise me to come to kilocode or somewhere else and why. I'm a simple person and I use deepseek and glm5.1 mainly. Thank you in advance for your help

reddit.com
u/Comfortable-Mix-7805 — 11 days ago
▲ 1 r/kilocode+1 crossposts

Why openrouter and not something with no mark-up?

I just don't really understand why people are using OpenRouter instead of something like Vercel AI Gateway, or one of the other no mark-up providers. Am I missing something?

reddit.com
u/Prince55Slaya — 11 days ago

SCAM ALERT: r/toolsdeals & AI Ecosystem Store & Jubayer Hossain

Stay away from the subreddit r/toolsdeals and the seller Jubayer Hossain.

I want to warn everyone about a scam operation running through the WhatsApp channel "AI Ecosystem Store" and the Reddit community r/toolsdeals.

The Scam Breakdown:

  • The Mod is the Scammer: The moderator of r/toolsdeals, Jubayer Hossain (WhatsApp: +880 1626-852509), uses the subreddit to lure people into buying "cheap" AI accounts.
  • Bait & Switch: I paid for a 1-year subscription. Only after payment did he claim it was actually a month-to-month reactivation plan.
  • Ghosting & Theft: After 2 months, he refused to reactivate the account or provide a full refund, returning only a tiny fraction of the money.
  • Manipulative Tactics: He constantly deletes chat history on WhatsApp and cycles through burner accounts to hide negative feedback.

Verdict:

The entire r/toolsdeals subreddit was created by a scammer to provide a false sense of legitimacy. Do not trust the reviews or the "deals" posted there.

Avoid Jubayer Hossain and any services linked to AI Ecosystem Store.

reddit.com
u/VlaadislavKr — 10 days ago

Kilo with Opus (Openrouter vs. Kilo as providers

Hey, I tried to use Opus 4.7 in Plan mode to review and optimize codebase. With OR as provider, and max thinking, kilo repeatedly terminated during thinking or working, while in Kilo as a provider it has never stopped. Any idea?

reddit.com
u/dotanchase — 10 days ago

How to generate images with Kilo?

Is there a recommended 'workflow' for generating images with Kilo CLI?

I've setup Kilo and am using Kilo CLI on a Mac. I've subscribed to KiloPass, so I believe I do have access to frontier models that can generate images. But without a dedicated 'chat app' - like ChatGPT has - that allows the agent to show me the image and get feedback, what is the workflow for asking Kilo to generate an image, viewing the image and getting Kilo to make changes? Is there a "recommended workflow"?

reddit.com
u/DelicateFandango — 13 days ago

Kilo code or Cursor or (codex/claude code)

Hi guys, i'm ditching github copilot, and i want to know whats you think, i'm planning to use cursor or kilo code, but idk what's the difference. And want to know how are the limits. And if you think that is better this two or codex/claude code. Can you explain to me why some people say that kilo code is superior, the model have more context windows? The limits are more generous? why is good for complex project?

reddit.com
u/BeautifulWestern7736 — 12 days ago