u/v01dm4n

Platform: Ubuntu 26.04, RTX 5060Ti, NVIDIA Driver 595.71.05, CUDA 13.2

I downloaded the IQ3XXS version and tried to run it with llamacpp both ways - with and without the newly introduced spec-type argument. But in both the cases, the model produces random characters as output. Here's one of the commands that I used:

./llama-server -m ~/.lmstudio/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-IQ3_XXS.gguf --no-mmproj -ngl 99 -c 8192 -np 1

I am able to run the regular models but not this one. Am I missing something or this quant has a problem?

Sample output:

1   .111 111111111 1049  . 21 1. 1.111 10 1.  .1

11.. .  A .

Update: This is a known issue where compiling llamacpp against CUDA13.2 makes it produce garbage with all variants (MTP/Non MTP). Downgrading to CUDA12.8 solved it. However that isn't simple on Ubuntu 26.04.

I recently upgraded to Ubuntu 26.04 because of NPU support. However, the recommended CUDA toolkit with 26.04 is 13.2. I managed to install the older versions of the toolkit using deb installers on nvidia site but llamacpp compilation fails because of glibc incompatibility. Eventually had to use docker and setup nvidia/cuda:12.8.0-devel-ubuntu24.04 for compilation.

Now happy with a jump from 30tps to 40tps :)

Garbage output while trying to run IQ3XXS variant of unsloth/Qwen3.6-27B-MTP-GGUF with llamacpp