u/GotHereLateNameTaken

Qwen cant wait to release 3.7 models

Qwen 27b MTP Config, Llama.cpp Single 3090

What setup are you using for qwen 27b on a single 3090?

Here's what I've started using today. It has to compact often but I'm worried about giving up more accuracy and reliability with a lower quant:

llama-server -m /Models/q3.6/Qwen3.6-27B-Q5_K_S.gguf -c 65536 -ngl -1 -t 8 -ctk q8_0 -ctv q8_0 --chat-template-kwargs "{\"preserve_thinking\": true}" --spec-type draft-mtp --spec-draft-n-max 2 --fit off --mmproj /Models/q3.6/mmproj-Qwen3.6-27B-f16.gguf --no-mmproj-offload

I'm getting around 65tk/s.

I've also seen these recommendations: https://github.com/noonghunna/club-3090/blob/master/docs/SINGLE_CARD.md

They seem to be using the q4 quant. How are you weighing the tradeoffs?

u/GotHereLateNameTaken — 7 days ago

▲ 10 r/LocalLLaMA

MTP Incoming today!

github.com

u/GotHereLateNameTaken — 8 days ago

▲ 2 r/LocalLLaMA

How is Aion UI with local llm?

Anyone tried this?

How extensible is it?

Does it work well with qwen27b?

Does it bloat the context window or manage it well?

github.com

u/GotHereLateNameTaken — 10 days ago