
Hi folks,
Enjoy an optimised Qwen3.6 35B-A3B and Qwen3.6 27B for coding and general purpose - it's able to solve puzzles correctly more often too.
The initial intent was to optimise the 35B-A3B reasoning traces since it's the most efficient on my 5090 setup as I can perform parallel jobs with llama.cpp on my prod.
Love 27B consistency, but the prefill churn on long horizon work is painful.
Tweaked the GBNF and tested a basic prompt to my custom Rust/Next.js bench to see improvements, and I have to say 35B-A3B had the nicest uplift:
I tested a simply "Hi" prompt, a puzzle, and my custom bench Rust/Next.js (60 task-suite)
Ironically I used the "Hi" prompt since community rightfully complained about the reasoning drag on simple things with the 35B-A3B
Tested Specs
- RTX 5090
- Fedora 43
- llama.cpp mainline April 24th
- Qwen3.6-35B-A3B-APEX-I-Balanced.gguf (-c 216k)
- Qwen3.6-27B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf (-c 114k)
- kv f16
- -b & -ub 256
- qwen's sampling for reasoning+coding
| Model | Test | Without grammar | With grammar | Improvement |
|---|---|---|---|---|
| Qwen3.6 27B | Hi tokens | 248 | 42 | 83.1% less, 5.90x fewer |
| Qwen3.6 27B | Puzzle tokens | 40,101 | 7,376 | 81.6% less, 5.44x fewer |
| Qwen3.6 27B | Puzzle time | 13m36s | 2m27s | 82.0% faster, 5.55x speedup |
| Qwen3.6 27B | Bench score | 4620 | 4620 | same score |
| Qwen3.6 27B | Bench time | 29m54s | 22m20s | 25.3% faster, 1.34x speedup |
| Qwen3.6 27B | Bench throughput | 1067 t/s | 1193 t/s | +11.8%, +126 t/s |
| Qwen3.6 35B-A3B | Hi tokens | 200 | 12 | 94.0% less, 16.67x fewer |
| Qwen3.6 35B-A3B | Puzzle tokens | 30,096 | 2,592 | 91.4% less, 11.61x fewer |
| Qwen3.6 35B-A3B | Puzzle time | 2m32s | 12s | 92.1% faster, 12.67x speedup |
| Qwen3.6 35B-A3B | Bench score | 4620 | 4740 | +2.6%, +120 score |
| Qwen3.6 35B-A3B | Bench time | 33m52s | 11m04s | 67.3% faster, 3.06x speedup |
| Qwen3.6 35B-A3B | Bench throughput | 1844 t/s | 2195 t/s | +19.0%, +351 t/s |
Total Score + Finish Time are the keys for the chart - accuracy per memory is personal reference
Qwen3.6 35B-A3B moves from X6 -> X1 as chart leader with massive time reduction and score bump.
Qwen3.6 27B moved from X4 -> X3 due to better finishing time - score maintains.
Total throughput recorded throughout benchmark
Qwen3.6 35B-A3B APEX I-Balanced: 1844 -> 2195 t/s
Qwen3.6 27B Uncensored HauHauCS Aggressive Q6_K_P: 1067 -> 1193 t/s
The Rust/Next.js bench is script-injected sequentially with OpenCode and it's performed on a prod repo for financial applications, so it's not publicly shared.
Puzzle Prompt
It's worth nothing, 35B-A3B struggled immensely with this puzzle. It would occasionally loop towards the end of CoT or get incorrect answers. Since it took me 12s vs +2m, it was easy to retry and get correct answers.
You are given a constrained planning problem. Think carefully, verify each condition, and do not skip impossibility checks.
Problem:
A courier starts at point S and must visit exactly once each of the locations A, B, C, D, and E, then end at T.
Travel times (in minutes) are symmetric:
S-A 4, S-B 6, S-C 8, S-D 7, S-E 9
A-B 5, A-C 7, A-D 3, A-E 8
B-C 4, B-D 6, B-E 5
C-D 5, C-E 3
D-E 6
A-T 8, B-T 6, C-T 5, D-T 7, E-T 4
Constraints:
1. C cannot be visited before B.
2. D must be visited immediately after A.
3. E cannot be the last location before T.
4. Total travel time must be less than 28 minutes.
5. Exactly one of these must be true:
- B is visited second
- C is visited fourth
6. If A is visited first, then B must be visited third.
7. The route must include at least one step whose travel time is exactly 3 minutes.
Task:
Determine whether a valid route exists.
- If it exists, provide one valid route and its total time.
- If it does not exist, prove why no valid route can satisfy all constraints.
- Show your reasoning clearly and check every constraint explicitly.
- Do not guess. If multiple routes seem possible, test them against all rules before concluding.
Output format:
1. Conclusion: VALID ROUTE EXISTS / NO VALID ROUTE EXISTS
2. Route: ...
3. Total time: ...
4. Constraint check: ...
5. Brief proof: ...
The answer should be NO VALID ROUTE EXISTS. The models churn through this one.
GBNF Grammar
root ::= think out
think ::= "<think>\n" "Q=" q "\n" "M=" m "\n" "K=" toks "\n" "R=" toks "\n" "V=" v "\n" "</think>\n\n"
q ::= "solve" | "prove" | "route" | "debug" | "patch" | "code" | "calc" | "compare" | "explain"
m ::= "case" | "enum" | "check" | "derive" | "edit" | "test" | "trace" | "rank"
v ::= "ok" | "fail" | "done" | "blocked" | "candidate" | "verify"
toks ::= tok | tok "," tok | tok "," tok "," tok | tok "," tok "," tok "," tok | tok "," tok "," tok "," tok "," tok
tok ::= [A-Za-z][A-Za-z0-9_.!<>=/-]{0,18}
out ::= [\x09\x0A\x0D\x20-\x7E]+
I've only noticed some thinking tags outside CoT on Open WebUI.
Outside of that, it works on Hermes, llama.cpp's WebUI and OpenCode without issue.
Since I did not have more time to use on my prod - past sleep time - I hope this gives some boost on your setup.