u/Nuwen-Pham

PSA: Kernel 7.0.6 + NVIDIA 580.159.03 (open G06 signed KMP) is unstable on Tumbleweed

**PSA: Kernel 7.0.6 + NVIDIA 580.159.03 (open G06 signed KMP) is unstable on Tumbleweed — at least on RTX 4090. Canary, rollback notes inside.**

Posting in case anyone else is sitting on this `zypper dup` with their finger hovering over the keyboard. I'd hold off, or at minimum treat it as a canary, not a routine update.

**What I ran into**

This morning I took a planned maintenance window to move from `kernel-default 6.19.12-1` to `7.0.6-1` on Tumbleweed. The NVIDIA candidate alongside it was `580.159.03`, and the signed open G06 KMP was built for `k7.0.5_1` (one patch level behind the kernel). Normally that's fine — openSUSE's weak-updates / kABI compatibility is designed for exactly this case, and the module did load cleanly on boot via weak-updates.

The boot itself succeeded. KDE/X11 session came up, `nvidia-smi` worked, `glxinfo` reported `NVIDIA GeForce RTX 4090/PCIe/SSE2` and `4.6.0 NVIDIA 580.159.03` — not llvmpipe, not nouveau. By every static check, it looked green.

Then I checked the journal.

**The actual failure**

From the moment the session started, sustained at roughly one event every few seconds:

```

NVRM: RC watchdog: GPU is probably locked

NVRM: Xid (PCI:0000:01:00): 8, pid=..., name=kwin_x11

NVRM: Xid (PCI:0000:01:00): 8, pid=..., name=Xorg.bin

kwin_x11: A graphics reset attributable to the current GL context occurred

```

This ran continuously for the entire ~6 minutes I left it up. The desktop was technically usable but KWin was clearly fighting the driver the whole time. For a trading workstation that needs to be boring, this was a hard no-go.

**Things I want to be clear about**

- Hardware: i9-14900K / RTX 4090 / dual NVMe Btrfs RAID1. The card is not flaky — it ran 6.19.12 + 580.126.18 cleanly before and after the canary.

- The KMP trailing the kernel by one patch level was **not** the problem. Weak-updates did its job; `modinfo nvidia` confirmed the module loaded and the version matched 580.159.03.

- This was not a packaging/repo incoherence either. Userspace, KMP, common, meta, gl, video, compute — all coherent on the 580.159.03 branch.

- Btrfs was clean. SELinux was Enforcing. Initramfs hygiene was clean. No nouveau, no llvmpipe fallback.

- This was a runtime stability problem in the kernel 7.0.6 / NVIDIA 580.159.03 combination itself, on this hardware, in this driver branch.

**Rollback**

Snapper rollback to the pre-`dup` zypp transaction snapshot, reboot, done. Back on `6.19.12-1-default` + `580.126.18`, zero NVRM/Xid events in the journal, GPU idling at 48°C/33W like normal. The parachute worked exactly as designed — this is genuinely one of the things openSUSE gets right.

I then added relational locks (`>= 7` on kernel-default and `>= 580.159` on the NVIDIA stack) to block the failed combo from coming back on the next `dup` without freezing me on today's exact builds. When a newer NVIDIA branch lands or 7.1+ ships, I'll re-canary.

**Questions for anyone who's been here**

  1. Has anyone successfully run kernel 7.0.x with NVIDIA 580.159.03 on Ada-generation cards (4090/4080/4070)? I'm curious whether this is RTX 4090-specific, Ada-wide, or broader.

  2. Anyone hit Xid 8 specifically with this combo, or different Xid codes?

  3. Anyone tried the proprietary G06 driver path instead of the open signed KMP on kernel 7?

Not a bug report — I haven't filed one yet, want to see if this is a known pattern or a Tungsten-specific thing first. If others confirm, I'll write it up properly.

Moral of the story: even when every static gate is green and the module loads, the runtime can still bite you. Canary your kernel transitions, especially when NVIDIA branches change at the same time. The fact that I could be back on a working system in under ten minutes is the entire reason I trust Tumbleweed for a workstation that has to be available Monday morning.

reddit.com
u/Nuwen-Pham — 6 days ago