u/rhpk

Hi all, I have a quad MI50 32GB setup on the same motherboard (an old Supermicro X10DRX with 8 PCIe 3.0 x8 slots, not ideal but I'm experimenting).

I'm using llama.cpp in docker with images from:

https://github.com/mixa3607/ML-gfx906

but also tried other ones with the same results.

This is what happens. It seems like my GPUs are grouped into two groups, 0+1 and 2+3. If I stick to just one of these groups, llama.cpp (but also Ollama) works fine. If I use the full quad GPU (so 0+1+2+3) or if I mix the groups (like 0+2, 0+3, 1+2, 1+3) I get:

ggml_cuda_compute_forward: SCALE failed
current device: 0, in function ggml_cuda_compute_forward at /build/llamacpp/ggml/src/ggml-cuda/ggml-cuda.cu:3114

and a bunch of trace-back messages:

[40461] libggml-base.so.0(+0x1addb)[0x7bf09bcbaddb]
[40461] libggml-base.so.0(ggml_print_backtrace+0x21c)[0x7bf09bcbb25c]
[40461] libggml-base.so.0(ggml_abort+0x15b)[0x7bf09bcbb43b]
[40461] /app/libggml-hip.so(+0x27f262)[0x7bf097ef7262]
[40461] /app/libggml-hip.so(+0x28a534)[0x7bf097f02534]
[40461] /app/libggml-hip.so(+0x2862a1)[0x7bf097efe2a1]
[40461] libggml-base.so.0(ggml_backend_sched_graph_compute_async+0x817)[0x7bf09bcd88c7]
[40461] libllama.so.0(_ZN13llama_context13graph_computeEP11ggml_cgraphb+0xa1)[0x7bf09be38a31]
[40461] libllama.so.0(_ZN13llama_context14process_ubatchERK12llama_ubatch14llm_graph_typeP22llama_memory_context_iR11ggml_status+0x114)[0x7bf09be3b124]
[40461] libllama.so.0(_ZN13llama_context6decodeERK11llama_batch+0x390)[0x7bf09be42630]
[40461] libllama.so.0(llama_decode+0xf)[0x7bf09be440ff]
[40461] libllama-common.so.0(_Z23common_init_from_paramsR13common_params+0x3ff)[0x7bf09c35e93f]
[40461] /app/llama-server(+0x11b668)[0x63bdd6185668]
[40461] /app/llama-server(+0x6bc41)[0x63bdd60d5c41]
[40461] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7bf09b71c1ca]
[40461] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7bf09b71c28b]
[40461] /app/llama-server(+0x6c875)[0x63bdd60d6875]

I can also launch two docker containers in parallel with, each allocated on one of the two groups, and they work flawlessly, so I'm excluding problems related to the motherboard.

I'm using ROCm 6.3.3. For llama, this is what I get:

ggml_cuda_init: found 2 ROCm devices (Total VRAM: 65504 MiB):
Device 0: AMD Radeon Graphics, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
Device 1: AMD Radeon Graphics, gfx906:sramecc-:xnack- (0x906), VMM: no, Wave Size: 64, VRAM: 32752 MiB
load_backend: loaded ROCm backend from /app/libggml-hip.so
load_backend: loaded CPU backend from /app/libggml-cpu-haswell.so
version: 1 (81b0d88)
built with GNU 13.3.0 for Linux x86_64

Any ideas?

Quad MI50 setup - weird behaviour