▲ 14 r/MacPro2019LocalAI+1 crossposts

Mac pro 2019 for inference, success

Just finished setting up my macpro 2019 as an LLM server. This a 12 cores, 96Gb ddr4, 2Tb, and more importantly Radeon Vega II 32gb vram.

On the software side, I'm running a headless Nixos server with llama.cpp.

So far, I'm impressed: qwen3.6-27B-UD-Q5_K_S runs at a rock steady 26-27 tk/s which I consider very usable after sluming below 10tk/s for the same dense model at Q3 on MacBook M4 32Gb ram. That's the only model I've tested so far. At 19Gb, it leaves plenty of room for kv cache.

I expect the area of tinkering will be to find the best combination of dense model size Vs kvcache room, and then of course testing some MoEs.

My plan is to run it as a backend for pi.dev, and to serve the rest of the household with a chat interface running on my proxmox server.

If you are lucky enough to have one of these beats lying around, you could do worse than turning them into an LLM server.

reddit.com

u/Weeblewobbly — 4 days ago

▲ 7 r/MacPro2019LocalAI+2 crossposts

AMD Radeon PRO V620 on Ubuntu bare-metal: PCI BAR / SR-IOV resource issue with multiple GPUs

TL;DR

What did you do to get your V620 GPUs to work?
How did you get over the cards trying to use ridiculous BARs of 384 GB per card for their SR-IOV/VF function?

^(Disclaimer: I used AI to help me gather all the data and present it in this post cleanly.)

I wanted to share an issue I ran into while trying to use AMD Radeon PRO V620 GPUs on Ubuntu bare-metal for AI workloads, and I’m curious if anyone else has seen the same thing.

Setup

Ubuntu 24.04.4
ROCm 7.2.3
Mac Pro 2019
Cubix Xpander PCIe expansion chassis
AMD Radeon PRO V620 GPUs
Bare-metal Linux only
No virtualization
No passthrough
No MxGPU use case

The goal was simple: use the V620s as normal ROCm GPUs for AI inference.

The problem

The V620s were visible to the system through PCIe, but they did not initialize as usable ROCm GPUs.

lspci showed the cards correctly as:

AMD/ATI Navi 21 [Radeon Pro V620] [1002:73a1]

But they only showed:

Kernel modules: amdgpu

not:

Kernel driver in use: amdgpu

rocm-smi either showed no V620s or only the unrelated internal GPUs, depending on the configuration.

Resource allocation looked broken

The sysfs resource files for the V620s were all zeroed out:

/sys/bus/pci/devices/0000:xx:00.0/resource

0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 0x0000000000000000
...

The V620s also exposed SR-IOV capability even though I was not using virtualization:

sriov_totalvfs=12
sriov_numvfs=0

The SR-IOV capability block showed:

Initial VFs: 12
Total VFs: 12
Number of VFs: 0
VF Device ID: 73ae

The confusing part was that SR-IOV was not actually enabled:

IOVCtl: Enable-
Number of VFs: 0

dmesg errors

During PCI resource allocation, the kernel appeared to account for the possible VF BARs anyway.

The dmesg output had errors like:

BAR 0 [mem size 0x800000000 64bit pref]: can't assign; no space
VF BAR 0 [mem size 0x6000000000 64bit pref]: can't assign; no space
VF BAR 0 [mem size 0x6000000000 64bit pref]: failed to assign

After that, forcing a driver probe did not help. The V620 remained unbound, resources stayed zero, and amdgpu failed during initialization.

Things I tested

While narrowing it down, I tested:

Removing other GPUs
Removing the Apple I/O card
Testing one Cubix cable / one side of the expander
Confirming no ReBAR resize service was active
Confirming sriov_numvfs=0
Setting sriov_drivers_autoprobe=0
Trying late amdgpu probing after boot
Testing boot arguments such as:

pci=realloc
iommu=pt
amdgpu.ras_enable=0

The pattern stayed the same: the cards were present on PCIe, but the V620 BAR resources failed before amdgpu could bind. From the logs, the issue looked related to the very large advertised SR-IOV VF BAR space.

Question

Has anyone else run into this with AMD Radeon PRO V620, especially in bare-metal Linux / ROCm use rather than virtualization?

I’m especially interested in hearing from anyone who has used:

V620 on Ubuntu bare-metal
Multiple V620s in one host
V620 behind PCIe switches or expansion chassis
Cubix or external PCIe expansion systems
ROCm with V620
SR-IOV-capable AMD GPUs where SR-IOV is not actually being used

Did your system allocate the PF BARs normally, or did the VF BARs cause PCI resource allocation problems?

What did you do to over come this problem?

reddit.com

u/Faisal_Biyari — 7 days ago

▲ 4 r/MacPro2019LocalAI

W5700X on Ubuntu T2 running only in low power

So weird issue, I decided toto give Linux a go on the Mac Pro and see what llama cpp was like versus Windows with LM Studio. All setup nice and easy but when running llama cpp I was only getting 14tks on a model that on Windows was doing 55tks.

After doing a bit of playing with llama options I managed to get to 21tks but was still way off where I expected it to be.

I opened up nvtop and could see the memory and gpu load, but then I noticed that the sclk wasn’t going up by very much 300-500 and that’s it. Power draw was barely 40w per card.

Anybody aware of anything I need to look at on this setup to get the cards running properly?

Quite astonishing actually that it can do 21tks at essentially minimal power… but would like to see what it can do at full throttle (hopefully beat Windows a touch)

reddit.com

u/Hephaestite — 9 days ago