u/GriffinDodd — reddlx

I picked up a GMKTec Max+ 395 96GB Evo-XT (same as Halo Strix) in the hope of running some medium size models at home, and as long as I stick with vulcan (ROCM has never managed to load a single model) and LM Studio then it's been pretty reliable.

I really wanted to try vLMM to see if there was a performance difference but oh my lordy lordy what a total nightmare of an experience.

I've tried sticking with some of the prebuilt docker images that claim to specifically support the gfx1151 architecture and ROCM 7+ but haven't been able to get a single one to actually serve a model.

I've specifically tried these most recommended builds...

https://rocm.docs.amd.com/projects/radeon-ryzen/en/latest/docs/advanced/advancedryz/linux/llm/build-docker-image.html

and

https://github.com/kyuz0/amd-strix-halo-vllm-toolboxes

None of these work out of the box. I've gone down a lot of rabbit holes regarding:

export HIP_VISIBLE_DEVICES=0
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export PYTORCH_ROCM_ARCH=gfx1151
export TORCH_BLAS_PREFER_HIPBLASLT=1

I've updated transformers, tried updating vllm (it pulls in CUDA builds). I've done all the BIOS and memory tweaks (in LM Studio this rig happily runs Qwen3.5 122B A10B Q4 with 88000 context window with no crashing or OOM).

Upgraded to Ubuntu 26 for the ROCM support, but not much help inside containers of course.

Has anyone got ROCM working properly for vLLM on this platform?