
Quad MI50 setup - weird behaviour
Hi all, I have a quad MI50 32GB setup on the same motherboard (an old Supermicro X10DRX with 8 PCIe 3.0 x8 slots, not ideal but I'm experimenting).
I'm using llama.cpp in docker with images from:
https://github.com/mixa3607/ML-gfx906
but also tried other ones with the same results.
This is what happens. It seems like my GPUs are grouped into two groups, 0+1 and 2+3. If I stick to just one of these groups, llama.cpp (but also Ollama) works fine. If I use the full quad GPU (so 0+1+2+3) or if I mix the groups (like 0+2, 0+3, 1+2, 1+3) I get:
ggml_cuda_compute_forward: SCALE failedcurrent device: 0, in function ggml_cuda_compute_forward at /build/llamacpp/ggml/src/ggml-cuda/ggml-cuda.cu:3114
and a bunch of trace-back messages:
[40461] libggml-base.so.0(+0x1addb)[0x7bf09bcbaddb][40461] libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7bf09bcbb25c][40461] libggml-base.so.0(ggml_abort+0x15b)[0x7bf09bcbb43b][40461] /app/libggml-hip.so(+0x27f262)[0x7bf097ef7262][40461] /app/libggml-hip.so(+0x28a534)[0x7bf097f02534][40461] /app/libggml-hip.so(+0x2862a1)[0x7bf097efe2a1][40461] libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x817)[0x7bf09bcd88c7][40461] libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1)[0x7bf09be38a31][40461] libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0x114)[0x7bf09be3b124][40461] libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x390)[0x7bf09be42630][40461] libllama.so.0(llama_decode+0xf)[0x7bf09be440ff][40461] libllama-common.so.0(_Z23common_init_from_paramsR13common_params+0x3ff)[0x7bf09c35e93f][40461] /app/llama-server(+0x11b668)[0x63bdd6185668][40461] /app/llama-server(+0x6bc41)[0x63bdd60d5c41][40461] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7bf09b71c1ca][40461] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7bf09b71c28b][40461] /app/llama-server(+0x6c875)[0x63bdd60d6875]
I can also launch two docker containers in parallel with, each allocated on one of the two groups, and they work flawlessly, so I'm excluding problems related to the motherboard.
I'm using ROCm 6.3.3. For llama, this is what I get:
ggml_cuda_init: found 2 ROCm devices (Total VRAM: 65504 MiB):
Device 0: AMD Radeon Graphics, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
Device 1: AMD Radeon Graphics, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiBload_backend: loaded ROCm backend from /app/libggml-hip.soload_backend: loaded CPU backend from /app/libggml-cpu-haswell.soversion: 1 (81b0d88)built with GNU 13.3.0 for Linux x86_64
Any ideas?