u/BeginningReveal2620

Here is what I could find for real information to get the HP Z2 G1a Strix Halo work that likely has a lot of errors still due to all of the misinformation in the channel.

Anyone else have an HP Z2 G1a Strix Halo that is tuned up correctly with Linux?

Thanks!

# HP Z2 Mini G1a (Strix Halo / Ryzen AI Max+ 395) — LLM Inference Optimization Guide

*Current as of May 2026. Optimized for local LLM inference (Ollama / llama.cpp). Strips the YouTube noise; everything here is checked against the current ROCm/kernel state for gfx1151.*

---

## TL;DR — what actually moves the needle

For LLM inference on this box, in order of impact:

**Which ROCm build you run** — biggest single lever, bigger than any BIOS tweak.
**Kernel version** — gfx1151 fixes are landing in Ubuntu's OEM kernel line, not where you'd expect.
**BIOS UMA / VRAM allocation** — gates how much of your 128GB the GPU can touch.
**Thermal sustain** — the chassis is conservative; repaste + airflow keeps clocks up on long sessions.
**Storage thermals** — cheap NVMe throttles hard under model loads.

Everything else is secondary. CPU overclocking, "debloat" rituals, registry hacks — irrelevant. This is a unified-memory inference appliance, not an AM5 gaming desktop.

---

## The platform reality (what HP locked, and why it doesn't matter much for inference)

HP welded platform logic into BIOS policy, the embedded controller, thermal tables, and OEM USB4/power management. You will **not** get unlocked PBO, voltage tuning, or traditional overclocking. This is not AM5.

But for inference that mostly doesn't matter — token throughput on this chip is **memory-bandwidth bound**, not core-clock bound. The 8060S iGPU pulls from the full unified memory pool at ~215–256 GB/s. You optimize bandwidth utilization and thermal sustain, not clocks. So the locked BIOS costs you almost nothing on the workload you actually care about.

---

## 1. The driver stack — your single most important decision

This is where every out-of-date guide (and your draft) goes wrong. The situation in mid-2026:

- gfx1151 is **still not on AMD's official ROCm support matrix** — it's marked **"Preview"** in ROCm 7.2, with official framework support limited to PyTorch on Linux. The supported RDNA3 targets remain gfx1100/gfx1101.
- The **community stack (TheRock nightlies) is far ahead of stable.** Stock ROCm 7.2 is slow on gfx1151; TheRock 7.11 nightlies are dramatically faster — described as a night-and-day difference by people running it daily.
- The old **`HSA_OVERRIDE_GFX_VERSION=11.5.1` hack is obsolete on nightlies** — they ship native gfx1151 kernels. You only still need the override on *stable* ROCm builds that don't ship gfx1151 kernels.
- The `HSA_ENABLE_SDMA=0` / `HSA_USE_SVM=0` workarounds are also being retired — not needed on the latest 7.11 nightlies, **provided** you're on a kernel with the fix (Ubuntu OEM kernel 1018+).

### Your two paths (you decide)

**Path A — Stable (ROCm 7.2 Preview)**
- Pros: reproducible, won't break on `apt upgrade`, fine for Ollama which bundles its own ROCm.
- Cons: measurably slower on gfx1151; you'll likely still need `HSA_OVERRIDE_GFX_VERSION=11.5.1`.
- Pick this if the box needs to *just work* and sit untouched.

**Path B — TheRock nightlies (7.11)**
- Pros: the real performance. Native gfx1151 kernels, no override hacks, big speedups on inference and especially on diffusion/video if you ever branch out.
- Cons: it's nightly — pin a known-good build, don't blind-update. Requires the OEM kernel for the SDMA/SVM fix.
- Pick this if you're chasing tokens/sec and willing to manage versions.

**My recommendation for pure Ollama/llama.cpp inference:** Ollama bundles its own ROCm runtime, so the simplest high-performance route is **Ollama with the gfx1151 override set in the systemd unit** (Path A-ish), and only go to TheRock nightlies if you're hand-building llama.cpp and want the last 20–30%.

---

## 2. OS / distro — the Fedora-vs-Ubuntu fight, settled for *this chip*

The generic rule "Fedora is ahead on new AMD hardware" was true a year ago and is **inverted for gfx1151 right now.** The specific Strix Halo fixes are landing in **Ubuntu's OEM kernel line**, and the working setups people are actually running in the field are Ubuntu-family:

- Ubuntu 24.04 LTS + **OEM kernel (1018+)** — the recommended combo for ROCm 7.2.
- Ubuntu 25.10 / Kubuntu 25.10 (kernel 6.19) — newer, working well for Ollama.

Fedora ships newer *mainline* kernels, but mainline isn't where the gfx1151 patches are landing first. So for this box: **Ubuntu 24.04 LTS + OEM kernel** is the pragmatic pick. Target kernel **6.18–6.19**.

Flatten the HP Windows preload — don't try to clean it. Sure Start, Wolf, endpoint hooks, updater spam: all gone. Fresh Ubuntu install, then immediately move to the OEM kernel.

```bash
# After base Ubuntu 24.04 install:
sudo apt update && sudo apt install linux-oem-24.04d # gets you the 6.18+ OEM kernel line
sudo usermod -aG render,video $USER
sudo reboot
```

---

## 3. BIOS — only the settings that matter

Don't go disabling security-processor / boot-integrity features at random; HP firmware recovery is genuinely painful. The settings that matter for inference:

- **UMA / GPU memory allocation** — the one that counts. Set the dedicated VRAM split high (commonly 96GB on a 128GB box). This is what lets the iGPU address the big pool. After boot, confirm with `rocm-smi --showmeminfo vram` — you want ~96–110GB visible.
- **Disable** Sure Start / remote management / wake-on-LAN garbage you don't use — stability and clean boots.
- **Leave** TPM/secure-boot alone unless you have a specific reason and know the recovery path.
- USB4/Thunderbolt: only relevant if you're attaching external storage/eGPU. HP ships conservative PCIe-tunneling policy buried under Security → Port Options. Relax only what you need.

---

## 4. Thermal — the real sustained-performance unlock

The chassis is fine; HP's stock thermal implementation is tuned for acoustics and warranty, not sustained AI load. The enemy on long inference sessions is **LPDDR + VRM heat saturation**, which drops clocks. For reference, people running sustained ML on this silicon see the GPU at **87–91°C and ~120W** — so headroom matters.

Worth doing:
- **PTM7950 phase-change pad** repaste — the single best thermal mod.
- **Better SSD thermal pads** + ensure good heatsink contact on the NVMe.
- **Airflow spacing** — don't box it in; if it's rack/flypack mounted, add external airflow.

Monitoring caveat: **`amd-smi` is currently blind on gfx1151** — it reports N/A for power/temp/clocks/fan even though the kernel exposes the data through sysfs/hwmon. So read temps via `sensors` (lm-sensors) and the hwmon sysfs nodes, not amd-smi.

```bash
sudo apt install lm-sensors && sudo sensors-detect --auto
watch -n2 sensors
```

---

## 5. Fan control

HP's EC firmware is restrictive — you're fighting embedded-controller logic, not a desktop board. Under Linux, `lm-sensors` + a curve tool will get you *some* control, but don't expect full desktop-style fan curves. Set realistic expectations: the lever here is mostly thermal mods + airflow, with fan control as a secondary assist.

---

## 6. Storage

AI workloads hammer storage on model load/swap, and cheap drives thermal-throttle badly in this chassis. Use high-endurance TLC with DRAM:

- Samsung 990 Pro, WD SN850X, or Solidigm/enterprise-class TLC.
- Avoid DRAM-less and QLC drives, and cheap OEM pulls.
- Dual fast NVMe is the strong config (OS on one, models on the other).

---

## 7. Inference tuning that actually helps

- **VRAM allocation** (BIOS UMA, above) is the gate — get this right first.
- **Ollama gfx1151 systemd override** (only needed on stable ROCm):

```bash
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf <<'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=-1"
EOF
sudo systemctl daemon-reload && sudo systemctl restart ollama
```

- **ROCm over Vulkan.** On gfx1151, ROCm is the fast, reliable backend. The Vulkan backend hangs or crashes on several current models (Qwen3.5 family especially) — GPU shows 100% but produces no tokens. Don't chase Vulkan for inference here.
- **llama.cpp build** (if hand-building): `-DGGML_HIP=ON -DAMDGPU_TARGETS="gfx1151"`, run with `-ngl 99 --no-mmap`.
- **Flash attention on**, keep models resident (`KEEP_ALIVE=-1`) so you're not paying reload cost.

### Rough performance expectations
- ~40 tok/s on a 30B-class model at q8 (e.g. GLM-4.x-flash) is a realistic, reported figure on this chip with ROCm.
- The win is **capacity, not raw speed**: 70B+ models fit entirely in unified memory with no PCIe bottleneck and no VRAM spill. That's the whole point of the box — it does what a 24GB discrete card simply can't.

---

## 8. The honest limitations

- No unlocked BIOS, no PBO, no voltage tuning. Accept it; it doesn't matter for bandwidth-bound inference.
- Official tooling (`amd-smi`) is half-blind on this chip — monitor via sysfs/sensors.
- Driver stack is a moving target. Pin your working ROCm/kernel combo and don't blind-update a production box.
- Dense large models are still bandwidth-limited (~10 tok/s class on 70B dense); MoE models are where this chip shines.

---

## Bottom line

The winning Z2 G1a inference build today:

- Ubuntu 24.04 LTS + OEM kernel (6.18+)
- ROCm: Ollama-bundled stable for "just works," or TheRock 7.11 nightlies (pinned) for max speed
- BIOS UMA set high (~96GB VRAM)
- PTM7950 repaste + good NVMe pads + airflow
- ROCm backend, not Vulkan; flash attention on; models resident
- Fast TLC NVMe, dual-drive

Optimize bandwidth and thermal sustain. Ignore the overclocking/debloat theater. The hardware is a sovereign-AI-grade unified-memory node; the firmware is still 2019 enterprise IT — and for inference, that mismatch barely costs you anything once the driver stack is right.

HP Z2 G1a Strix Halo - Reality Check?