

Pi Agent makes very nice combination with limited hardware. Running qwen3.6 35B A3B IQ4 at ~22t/s with 160k context on 6 vram 64 RAM.
Some days ago I shared some findings regarding running qwen 3.6 in this repo https://github.com/igpdev/rtx4050-local-llm-qwen3.6-35B in case would help someone.
(Post copied from original llamallm as here is no option to reshare from other community)
After some tweaks playing around with llamacpp flags, found this config that allows quite nice and usable workflow with qwen 3.6 35B with 160k context using Bartowski IQ4_NL version
The key here is Pi Agent with its simplicity and small context, I did a small exercise with a prd document asking to build a simple habit tracker using nuxt framework and sqlite, and playwright for e2e testing.
It clearly does the job faster than wen using Opencode, (Yes, opencode is still usefull too, but with the limited speed regarding the setup, Pi feels very fluid). it made the right call tools to setup everything including the playwright e2e testing framework.
Pi agent is for local setups with small vram and some usefull RAM what Linux to old laptops. It can provide you with a very decent agentic workflow knowing how to define clear tasks. To make it simple, I just made the pi system prompt to be as silent as possible, given that I also prefer a ralph loop process that do not need verbosity but just to fullfill the goal.
Of course I have to admit is not oriented for users not understanding what they are doing, can be dangerous given its yolo default mode. I feel is oriented to users that love the neovim/emacs customization philosophy.
In case someone is interested or has suggestions here is the flags: ____
TURBO_LAYER_ADAPTIVE=1 llama-server \
-m ~/models/Qwen_Qwen3.6-35B-A3B-IQ4_NL.gguf \
--host 0.0.0.0 \
--port 8084 \
-ngl 999 \
-c 160000 \
-n 8192 \
-b 2048 \
-ub 2048 \
--cont-batching \
--threads 12 \
--threads-batch 16 \
--prio 2 \
--poll 50 \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--flash-attn on \
--cache-prompt \
--cache-reuse 512 \
--ctx-checkpoints 10 \
--n-cpu-moe 999 \
--temp 0.6 \
--min-p 0.05 \
--top-k 40 \
--top-p 0.95 \
--repeat-penalty 1.05 \
--jinja \
--reasoning auto \
--reasoning-budget 8192 \
--no-mmap
____
And same disclaimer. I am not an expert, I just keep experimenting pushing to the limit that low spec machine. One really starts to learn a lot when going local.