u/TechnoSmacked

Hello folks, im getting my hand on a blackwell pro maxq this week and will be getting to work immediately with vllm. I'm working on a WRX90E-SAGE se eeg, with a 9965x and 128gb of ram. I have done through research on the subject but I'm limited to only so many answers as the community that holds the artifact that is the 6000 pro is made up of a few enthusiastic people. Id like to start a thread on the optimization of the parameter needed to run this model properly and efficiently on a single card. Quantizing as we all know does the job, but as we all know its also poison to the llm in real life scenarios.

So, what is everyone running? What are your generation speeds? Are you happy with the way the llm is working? Do you use it mostly for agentic or code based tasks?

P.S. don't forget to build your works flow and plan with frontier models before feeding the information to a smaller model.

Qwen 3.6 27b. To quantize or not to quantize. That is the question.