





I am very excited about the future of local AI. With the spread of AI agents, the amount of VRAM now achievable locally, the quality of small and medium LLMs, and the community growing around all of this, the future is looking very good.
I am writing this to document my successes with the following:
I wrote a couple of old guides. Check them out for reference, as needed:
I'm going to focus on setting up Ubuntu and all the packages needed for the infrastructure of local AI.
Important: This is an experimental community guide. Some parts involve patched kernels, unsupported GPU configurations, and boot-level PCIe changes. This worked for my Mac Pro 2019 systems, but you should expect troubleshooting, and you should be comfortable recovering from a failed boot. I am not responsible for any outcome of using this guide, whether it be positive, negative, or anything in between.
linux-hwe-6.17 source package, to support the Infinity Fabric Link Bridge. Alternative: Standard Ubuntu kernel.Please let me know if the GitHub links do not work.
These are the choices I made, and I am still refining them. They work for me. Keep in mind that this is all held together with the digital equivalent of duct tape. If you change anything, it may or may not work. If you do, I would genuinely appreciate hearing what you tried, what worked, what failed, and why you changed it.
Step 00: Infinity Fabric Link (Jumper & Bridge)
Please remove the Infinity Fabric Link Jumper(s) or Bridge from the GPU. Ubuntu 24 kernels do not currently support it, as of 6.17.
Specifically, with kernel 6.8, none of the GPUs will work. When upgrading to 6.17, only one GPU will work.
If you have an Infinity Fabric Link Jumper or Bridge, follow the patch section later in the guide to make it work with your GPUs.
Step 01: Update, Upgrade, and Tweak the System
What we will do:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2001%3A%20Update%2C%20Upgrade%2C%20and%20Tweak%20the%20System" | bash
Step 02: Install T2 Linux Repository
Since we are using a Mac Pro 2019, which is a Mac with a T2 chip, some additional packages are required to be able to properly communicate with the hardware.
What we will do:
applesmc-t2 apple-bce t2fanrdCopy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2002%3A%20Install%20T2%20Linux%20Repository" | bash
Step 03: Enable T2 Fan Daemon
After installing the T2 packages, the command below is used to activate the fan service.
What we will do:
t2fanrd systemd serviceCopy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2003%3A%20Enable%20T2%20Fan%20Daemon" | bash
Step 03-Optional: Set Fans to Maximum
I do not trust Apple Cooling. I would rather the fans wear out and replace them for a few dollars, versus the GPUs (especially the Duo models) being damaged due to overheating.
What we will do:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2003-Optional%3A%20Set%20Fans%20to%20Maximum" | bash
Step 04: Download and Install ROCm 7.2.3
This section will install ROCm 7.2.3, but it will NOT install dkms or amdgpu drivers. I opted to use the kernel driver, drm/amdgpu, so I can later patch it to support the Infinity Fabric Link Bridge.
What we will do:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2004%3A%20Download%20and%20Install%20ROCm%207.2.3" | bash
Step 05: Install Python Tools
We will be using Python and pip to install several packages for local AI. The following commands are to set up the correct versions, as well as some quality of life choices.
What we will do:
2to3 python-is-python3 python3-pip python3-venv python3-dev python3-setuptoolspip wheel setuptoolsnumpy 1.26.4 specifically, system wideCopy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2005%3A%20Install%20Python%20Tools" | bash
Step 06: Install PyTorch & Other ROCm Related Wheels
Not everything here is needed for everyone. I included what I could, what worked, and what had some value to some local AI use case.
What we will do:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2006%3A%20Install%20PyTorch%20%26%20Other%20ROCm%20Related%20Wheels" | bash
Step 07: Verifying Everything
We just completed installing everything in the standard way. We just need to verify that everything is now set up correctly.
What we will do:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/2.%20Setting%20up%20Ubuntu%20after%20Installation/Step%2007%3A%20Verifying%20Everything" | bash
AMD released several GPUs specifically for the Mac Pro 2019 that support their Infinity Fabric.
These GPUs and the Infinity Fabric Links are discussed in these posts:
The first set of GPUs that support it were the AMD Radeon PRO Vega II & Vega II Duo. The PC equivalent is an AMD Radeon PRO VII, which also supports an Infinity Fabric Link.
The second set of GPUs are the AMD Radeon PRO W6800X, W6800X Duo, and W6900X. These GPUs are in the Sienna Cichlid family of GPUs. Also referred to as RDNA2.
At the announcement of the Sienna Cichlid family, these GPUs were marketed as supporting xGMI. The Infinity Fabric Link is the physical bridge / jumper. xGMI is the software path that allows the GPUs to communicate over that link. However, on release, only the Apple MPX GPUs actually supported the Infinity Fabric Links, while the standard versions did not.
This might explain why support for xGMI on Sienna Cichlid was added between 2019 and 2020 to the Linux kernel drm/amdgpu, but later removed in 2022.
Many of us here in the subreddit tried to figure out the problem with the Infinity Fabric Link, and tried to find a solution to it. One such redditor actually cracked it; creating a patch to the current kernel drm/amdgpu driver, which through my testing seems to have completely solved the Infinity Fabric Link regression that happened in 2022.
You'll need to keep in mind that this is just the first step. While we are moving forward, there is still the question of ROCm support, HIP support, and everything else.
Step 01: Download, Build, & Install the Patched Kernel Files
Let's start. We will do the following:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/3.%20Infinity%20Fabric%20Link%20Jumper-Bridge/Step%2001%3A%20Download%2C%20Build%2C%20%26%20Install%20the%20Patched%20Kernel%20Files" | bash
With this, you are now the proud user of a patched kernel that supports the Infinity Fabric Links on the Sienna Cichlid MPX GPUs.
At this point, shut the system down, reinstall the Infinity Fabric Link Jumper or Bridge, then boot back into the patched kernel.
Step 02: Verify Patched Kernel & GPU Initialization
We should probably run a verification one last time. Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/3.%20Infinity%20Fabric%20Link%20Jumper-Bridge/Step%2002%3A%20Verify%20Patched%20Kernel%20%26%20GPU%20Initialization" | bash
While more testing is still needed, this is quite the achievement for the community. Thank you again, anonymous redditor.
I have been using my Mac Pro 2019 with Dual AMD Radeon PRO W6800X Duo for local AI inference for some time now, and I have not had any BAR-related problems. However, since I moved from using Proxmox to having Ubuntu 24 on bare-metal, I have started noticing some BAR warnings and errors.
It seems that this problem may come from the way the Mac Pro firmware allocates PCIe resources before Linux takes over, specifically when using Duo MPX GPUs.
One redditor, whose account is now deleted, shared a GitHub link to what I can only describe as someone's documentation of how he fixed the BAR issue on Vega II Duo GPUs. I have dubbed this the nbritton's method.
Our goal now is to use nbritton's method, adapted for the W6800X Duo. I tried to make it also work as a copy and paste solution for the Vega II Duo as well, but I have not tested it.
Warning: This changes GPU driver load order and PCIe BAR allocation behavior. If something goes wrong, you may need to boot from a recovery kernel, remove the service, or undo the GRUB changes. Also, note that SGLang's AMD GPU documentation recommends pci=realloc=off iommu=pt, which conflicts with nbritton's method because nbritton's method depends on PCIe BAR reallocation behavior. In other words, pci=realloc must not be disabled for this method.
Let's start.
We will do the following:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/4.%20AMD%20Duo%20MPX%20GPUs%20and%20Setting%20BAR%20Correctly" | bash
After completing the linked sections above, we should have:
Once you're done, please reboot to make sure everything sticks. Then repeat step 07: Verify Everything, above to verify everything is correct and as it should be.
Now that the infrastructure is ready, it's time to move to our frameworks of choice.
While I definitely plan to expand, I have focused mainly on text generation. When I first started, consideration was Ollama, Llama.cpp, and vLLM. I see new options now, such as SGLang as well.
I am excited to share that vLLM supports this setup and works well. I hope to release a separate guide for it soon.
For the purpose of this guide, I will continue with Ollama, for the simplicity of it, and a Hello World type scenario.
Step 01: Install and Configure Ollama
We will do the following:
ollama.service vs. ollama serve separate model librariesCopy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/6.%20Local%20AI/Step%2001%3A%20Install%20and%20Configure%20Ollama" | bash
Step 02: Verify Ollama Setup
We will do the following:
Copy the following command into your command line interface of choice:
curl -fsSL "https://raw.githubusercontent.com/FaisalBiyari/MacPro2019LocalAI/refs/heads/main/Reddit/Mac%20Pro%202019%20Local%20AI%20Guide%3A%20Ubuntu%2024.04%2C%20ROCm%207.2.3%2C%20PyTorch%202.10%2C%20and%20Infinity%20Fabric%20Link/6.%20Local%20AI/Step%2002%3A%20Verify%20Ollama%20Setup" | bash
Step 03: Download and Run Models
We will do the following:
Copy the following command into your command line interface of choice:
ollama run qwen3.5:0.8b --verbose
You can find more models on Ollama's website. Below are some other models I am considering:
ollama pull qwen3.6:27b
ollama pull gemma4:31b-it-q4_K_M
ollama pull granite4.1:30b
ollama pull medgemma:27b
ollama pull mistral-medium-3.5:128b
ollama pull gpt-oss:120b
ollama pull qwen3.5:122b
ollama pull nemotron-3-super:120b
With this, we are done with this guide.
It has been a long journey setting up this infrastructure, and preparing for the actual goal.
My testing was done on Mac Pro 2019 systems with dual W6900X MPX modules and dual W6800X Duo MPX modules. I have not tested this with Vega II or Vega II Duo MPX GPU modules.
Next, I plan to focus on vLLM for a while. Optimization, quantization, and automation of operations.
After that, I hope to dive into Hermes Agent by Nous, with the hope of building multiple agents around a few local models run on vLLM, communicating and working together.
Expanding to images or vision, as well as to voice, is also down the pipeline.
The possibilities are endless. I hope to hear what everyone else experiences with this guide and with local AI in general: what worked, what failed, what workloads you are running, what use cases you care about, what problems you hit, and what solutions you found.
Looking forward to seeing how everyone takes advantage of this guide, and local AI.
Credit where credit is due. A lot of the information here was gathered from the community in bits and pieces.
I do want to take the opportunity to thank the anonymous redditor for his/her contribution (creating the whole kernel patch). THANK YOU!
Nikolas Britton for the nbritton method, fixing the BAR issue on the AMD Duo MPX GPUs.
u/AdityaGarg8 for always being supportive, no questions asked.
My AI of choice, for the support through all of this.
r/MacPro2019LocalAI redditors, for keeping in touch, and motivating me to continue going. You guys are the real MVPs.
Disclaimer: I wrote this post myself. I also used AI as a tool to help clean up the wording and formatting.
Resources: