u/Flashy-Photograph-19

▲ 19 r/zfs

ZFS pool corrupted after both Proxmox host and TrueNAS VM wrote to disks simultaneously, need recovery advice

I recently migrated my 3x2tb RAID Z1 pool (named storage) from a Truenas VM to the Proxmox host itself so i can free up some RAM. Last night a power fluctuation rebooted my server (I know, I need a UPS). I forgot that I had left the TrueNAS VM set to auto-start at boot. So after reboot, both the Proxmox host (which had the pool imported) and the TrueNAS VM (which also tried to import the pool) were writing to the same disks at the same time.
Now the pool is corrupted. Here's what I've tried and the current state.

zpool import
  pool: storage
    id: 4106071641955219047
 state: FAULTED
status: The pool metadata is corrupted.
action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using the '-f' flag.
config:
        storage            FAULTED  corrupted data
          raidz1-0         ONLINE
            disk1          ONLINE
            disk2          ONLINE
            disk3          ONLINE

What I've attempted while searching for solutions:

  • zpool import -f storage > I/O error
  • zpool import -F -n storage > nothing
  • zpool import -FX -o readonly=on storage > cannot import: one or more devices currently unavailable
  • zdb -e -AAA storage > shows configuration, then zdb: can't open 'storage': Input/output error
  • zdb -e -y storage > same I/O error
  • Forced import with -T using a known good txg from labels (all three disks show txg 1673045 in zdb -l): zpool import -d /dev/disk/by-id -o readonly=on -f -T 1673045 storage > one or more devices is currently unavailable
  • zpool import -d /dev/disk/by-id -o readonly=on -f -F storage > I/O error, destroy and re-create from backup
  • zdb -l on each disk shows the same pool GUID, same txg=1673045, and all labels appear intact.
  • Smart data on all three disks is clean (no reallocated or pending sectors).

Is there anything I'm missing? Is the data gone for good?

reddit.com
u/Flashy-Photograph-19 — 7 days ago
▲ 143 r/homelab

I’m running Proxmox and recently moved all my services from individual LXCs into a single Docker setup inside one LXC. Everything’s been fine except for Immich.

Whenever I do a bulk upload from the Android app, Immich ends up crashing the entire Proxmox host. It’s not just the container — the whole server becomes unresponsive and I have to hard reset it. Even trying to shut down the LXC or the host doesn’t work once it happens.
This behavior has also shown up before when Immich was on dedicated LXC.

Immich storage is on ZFS. I’m planning to add more RAM soon.
I’m wondering if anyone else has run into this or has ideas on what could be causing it?

u/Flashy-Photograph-19 — 17 days ago