u/skinnyzaz

Running 3 x rtx 3090's was using Q8 version of 27b on llama.cpp, saw some posts about how fast the autoround int4 version was on 2 x 3090's so I tested it. It is insanely faster and seems to follow my ticket skill/workflow WAY better. the Q8 version seems to think about what its doing and try different things to complete the ticket even though I have a ticket workflow built for it. the int4 autoround version seems to just follow the ticket and do a great job. A 5 min ticket on int4 from a few tests will take the Q8 version 15-20 min sometimes. Does this seem correct for Q8 when it comes to work like doing user tickets?

qwen3.6 27b int4 does user support tickets better and insanely faster than Q8