u/BeginningReveal2620

HP Z2 G1a Strix Halo - Reality Check?

Here is what I could find for real information to get the HP Z2 G1a Strix Halo work that likely has a lot of errors still due to all of the misinformation in the channel.

Anyone else have an HP Z2 G1a Strix Halo that is tuned up correctly with Linux?

Thanks!

# HP Z2 Mini G1a (Strix Halo / Ryzen AI Max+ 395) — LLM Inference Optimization Guide

*Current as of May 2026. Optimized for local LLM inference (Ollama / llama.cpp). Strips the YouTube noise; everything here is checked against the current ROCm/kernel state for gfx1151.*

---

## TL;DR — what actually moves the needle

For LLM inference on this box, in order of impact:

  1. **Which ROCm build you run** — biggest single lever, bigger than any BIOS tweak.
  2. **Kernel version** — gfx1151 fixes are landing in Ubuntu's OEM kernel line, not where you'd expect.
  3. **BIOS UMA / VRAM allocation** — gates how much of your 128GB the GPU can touch.
  4. **Thermal sustain** — the chassis is conservative; repaste + airflow keeps clocks up on long sessions.
  5. **Storage thermals** — cheap NVMe throttles hard under model loads.

Everything else is secondary. CPU overclocking, "debloat" rituals, registry hacks — irrelevant. This is a unified-memory inference appliance, not an AM5 gaming desktop.

---

## The platform reality (what HP locked, and why it doesn't matter much for inference)

HP welded platform logic into BIOS policy, the embedded controller, thermal tables, and OEM USB4/power management. You will **not** get unlocked PBO, voltage tuning, or traditional overclocking. This is not AM5.

But for inference that mostly doesn't matter — token throughput on this chip is **memory-bandwidth bound**, not core-clock bound. The 8060S iGPU pulls from the full unified memory pool at ~215–256 GB/s. You optimize bandwidth utilization and thermal sustain, not clocks. So the locked BIOS costs you almost nothing on the workload you actually care about.

---

## 1. The driver stack — your single most important decision

This is where every out-of-date guide (and your draft) goes wrong. The situation in mid-2026:

- gfx1151 is **still not on AMD's official ROCm support matrix** — it's marked **"Preview"** in ROCm 7.2, with official framework support limited to PyTorch on Linux. The supported RDNA3 targets remain gfx1100/gfx1101.
- The **community stack (TheRock nightlies) is far ahead of stable.** Stock ROCm 7.2 is slow on gfx1151; TheRock 7.11 nightlies are dramatically faster — described as a night-and-day difference by people running it daily.
- The old **`HSA_OVERRIDE_GFX_VERSION=11.5.1` hack is obsolete on nightlies** — they ship native gfx1151 kernels. You only still need the override on *stable* ROCm builds that don't ship gfx1151 kernels.
- The `HSA_ENABLE_SDMA=0` / `HSA_USE_SVM=0` workarounds are also being retired — not needed on the latest 7.11 nightlies, **provided** you're on a kernel with the fix (Ubuntu OEM kernel 1018+).

### Your two paths (you decide)

**Path A — Stable (ROCm 7.2 Preview)**
- Pros: reproducible, won't break on `apt upgrade`, fine for Ollama which bundles its own ROCm.
- Cons: measurably slower on gfx1151; you'll likely still need `HSA_OVERRIDE_GFX_VERSION=11.5.1`.
- Pick this if the box needs to *just work* and sit untouched.

**Path B — TheRock nightlies (7.11)**
- Pros: the real performance. Native gfx1151 kernels, no override hacks, big speedups on inference and especially on diffusion/video if you ever branch out.
- Cons: it's nightly — pin a known-good build, don't blind-update. Requires the OEM kernel for the SDMA/SVM fix.
- Pick this if you're chasing tokens/sec and willing to manage versions.

**My recommendation for pure Ollama/llama.cpp inference:** Ollama bundles its own ROCm runtime, so the simplest high-performance route is **Ollama with the gfx1151 override set in the systemd unit** (Path A-ish), and only go to TheRock nightlies if you're hand-building llama.cpp and want the last 20–30%.

---

## 2. OS / distro — the Fedora-vs-Ubuntu fight, settled for *this chip*

The generic rule "Fedora is ahead on new AMD hardware" was true a year ago and is **inverted for gfx1151 right now.** The specific Strix Halo fixes are landing in **Ubuntu's OEM kernel line**, and the working setups people are actually running in the field are Ubuntu-family:

- Ubuntu 24.04 LTS + **OEM kernel (1018+)** — the recommended combo for ROCm 7.2.
- Ubuntu 25.10 / Kubuntu 25.10 (kernel 6.19) — newer, working well for Ollama.

Fedora ships newer *mainline* kernels, but mainline isn't where the gfx1151 patches are landing first. So for this box: **Ubuntu 24.04 LTS + OEM kernel** is the pragmatic pick. Target kernel **6.18–6.19**.

Flatten the HP Windows preload — don't try to clean it. Sure Start, Wolf, endpoint hooks, updater spam: all gone. Fresh Ubuntu install, then immediately move to the OEM kernel.

```bash
# After base Ubuntu 24.04 install:
sudo apt update && sudo apt install linux-oem-24.04d   # gets you the 6.18+ OEM kernel line
sudo usermod -aG render,video $USER
sudo reboot
```

---

## 3. BIOS — only the settings that matter

Don't go disabling security-processor / boot-integrity features at random; HP firmware recovery is genuinely painful. The settings that matter for inference:

- **UMA / GPU memory allocation** — the one that counts. Set the dedicated VRAM split high (commonly 96GB on a 128GB box). This is what lets the iGPU address the big pool. After boot, confirm with `rocm-smi --showmeminfo vram` — you want ~96–110GB visible.
- **Disable** Sure Start / remote management / wake-on-LAN garbage you don't use — stability and clean boots.
- **Leave** TPM/secure-boot alone unless you have a specific reason and know the recovery path.
- USB4/Thunderbolt: only relevant if you're attaching external storage/eGPU. HP ships conservative PCIe-tunneling policy buried under Security → Port Options. Relax only what you need.

---

## 4. Thermal — the real sustained-performance unlock

The chassis is fine; HP's stock thermal implementation is tuned for acoustics and warranty, not sustained AI load. The enemy on long inference sessions is **LPDDR + VRM heat saturation**, which drops clocks. For reference, people running sustained ML on this silicon see the GPU at **87–91°C and ~120W** — so headroom matters.

Worth doing:
- **PTM7950 phase-change pad** repaste — the single best thermal mod.
- **Better SSD thermal pads** + ensure good heatsink contact on the NVMe.
- **Airflow spacing** — don't box it in; if it's rack/flypack mounted, add external airflow.

Monitoring caveat: **`amd-smi` is currently blind on gfx1151** — it reports N/A for power/temp/clocks/fan even though the kernel exposes the data through sysfs/hwmon. So read temps via `sensors` (lm-sensors) and the hwmon sysfs nodes, not amd-smi.

```bash
sudo apt install lm-sensors && sudo sensors-detect --auto
watch -n2 sensors
```

---

## 5. Fan control

HP's EC firmware is restrictive — you're fighting embedded-controller logic, not a desktop board. Under Linux, `lm-sensors` + a curve tool will get you *some* control, but don't expect full desktop-style fan curves. Set realistic expectations: the lever here is mostly thermal mods + airflow, with fan control as a secondary assist.

---

## 6. Storage

AI workloads hammer storage on model load/swap, and cheap drives thermal-throttle badly in this chassis. Use high-endurance TLC with DRAM:

- Samsung 990 Pro, WD SN850X, or Solidigm/enterprise-class TLC.
- Avoid DRAM-less and QLC drives, and cheap OEM pulls.
- Dual fast NVMe is the strong config (OS on one, models on the other).

---

## 7. Inference tuning that actually helps

- **VRAM allocation** (BIOS UMA, above) is the gate — get this right first.
- **Ollama gfx1151 systemd override** (only needed on stable ROCm):

```bash
sudo mkdir -p /etc/systemd/system/ollama.service.d
sudo tee /etc/systemd/system/ollama.service.d/override.conf <<'EOF'
[Service]
Environment="HSA_OVERRIDE_GFX_VERSION=11.5.1"
Environment="OLLAMA_FLASH_ATTENTION=1"
Environment="OLLAMA_KEEP_ALIVE=-1"
EOF
sudo systemctl daemon-reload && sudo systemctl restart ollama
```

- **ROCm over Vulkan.** On gfx1151, ROCm is the fast, reliable backend. The Vulkan backend hangs or crashes on several current models (Qwen3.5 family especially) — GPU shows 100% but produces no tokens. Don't chase Vulkan for inference here.
- **llama.cpp build** (if hand-building): `-DGGML_HIP=ON -DAMDGPU_TARGETS="gfx1151"`, run with `-ngl 99 --no-mmap`.
- **Flash attention on**, keep models resident (`KEEP_ALIVE=-1`) so you're not paying reload cost.

### Rough performance expectations
- ~40 tok/s on a 30B-class model at q8 (e.g. GLM-4.x-flash) is a realistic, reported figure on this chip with ROCm.
- The win is **capacity, not raw speed**: 70B+ models fit entirely in unified memory with no PCIe bottleneck and no VRAM spill. That's the whole point of the box — it does what a 24GB discrete card simply can't.

---

## 8. The honest limitations

- No unlocked BIOS, no PBO, no voltage tuning. Accept it; it doesn't matter for bandwidth-bound inference.
- Official tooling (`amd-smi`) is half-blind on this chip — monitor via sysfs/sensors.
- Driver stack is a moving target. Pin your working ROCm/kernel combo and don't blind-update a production box.
- Dense large models are still bandwidth-limited (~10 tok/s class on 70B dense); MoE models are where this chip shines.

---

## Bottom line

The winning Z2 G1a inference build today:

- Ubuntu 24.04 LTS + OEM kernel (6.18+)
- ROCm: Ollama-bundled stable for "just works," or TheRock 7.11 nightlies (pinned) for max speed
- BIOS UMA set high (~96GB VRAM)
- PTM7950 repaste + good NVMe pads + airflow
- ROCm backend, not Vulkan; flash attention on; models resident
- Fast TLC NVMe, dual-drive

Optimize bandwidth and thermal sustain. Ignore the overclocking/debloat theater. The hardware is a sovereign-AI-grade unified-memory node; the firmware is still 2019 enterprise IT — and for inference, that mismatch barely costs you anything once the driver stack is right.

reddit.com
u/BeginningReveal2620 — 13 days ago

I am building a real business and AMD Strix Halo for months how seems to be more Science Project than Production Ready Silicon. Any luck out there for real production grade AMD Strix Halo Clusters?

reddit.com
u/BeginningReveal2620 — 1 month ago