u/Express_Quail_1493 — reddlx

▲ 2 r/ollama+1 crossposts

Nnoticing qwen-27b@q2 better than qwen-35b@q8?

The Latest qwen3.6 models. Is this odd? i code with qwen models and the 27b@q2 even heavily quantised perform wayyy better than 35b-q8?

Have anyone else also tested across quant levels?

Edit: for anyone asking quants and setup im experiencing this on its on unsloth dynamic k_xl quants
qwen3.6-27b-UD-q2_k_xl. And qwen-3.5-35b-UD-Q8
llama.cpp latest using opencode unsloth dynamic quant makes the q2 more usable than expected.

For some odd reason i find 35b-a3b is really smart but simultaneously behaves kinda dumb. feels like im using a 4b model rather than a 35b. maybe im suspecting MOE behavioural capacity is tightly linked to num of active params rather than total. Im suspecting total params only contribute to how much the model knows but not how complex it can execute. For my use case i need him to understand complexity rather than accuracy. Bit i don’t think enough active params lights up to cover the complexity of the task and makes the 35b-a3b go wonky maybe i need to give 35b-a3b only give him baby tasks? But i need a bit more investigation to close in on that conclusion. Would be helpful if anyone can test this also.

reddit.com

u/Express_Quail_1493 — 5 days ago

▲ 79 r/LocalLLaMA

500k context on 48gb VRAM!! - 21tok/s (coding)

I found this model hiding in the corner of huggingface: https://huggingface.co/Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF

Looks to be tuned specifically for math but i thought i'd give it a try since i cant run the full 120b nemotron super and it seem to hold up like a champ in agentic coding for some odd reason. been using it to code all my projects for a week now its amazing. Wouldnt dream of having 500k tokens on my potato dual TITAN RTX.

If you do happen to try it drop a cmment on your experience with it where did it break what usecase did u use it for ETC.

u/Express_Quail_1493 — 11 days ago

▲ 1 r/LocalLLaMaCoders

Arent These single file LLM coding tests like browserOS pretty much redundant now most 2026 LLM can easily handle this? In what other ways we can stress test these models for novel coding problems?

reddit.com

u/Express_Quail_1493 — 1 month ago

▲ 1 r/LocalLLaMaCoders

Im thinking Honestly past the 70b margin most of the improvements are slim.

From 4b -> 8b is wide

8b -> 14b is still wide

14b -> 30b nice to have territory

30b -> 80b negligible

80b -> 300b or 900b barely

What are your thoughts?

reddit.com

u/Express_Quail_1493 — 2 months ago

▲ 3 r/LocalLLaMaCoders

I want to have a feel of what others local agentic coding setup is like and your biggest performance constraint with fully local coding setup

reddit.com

u/Express_Quail_1493 — 2 months ago