u/cewong2

▲ 0 r/btrfs+1 crossposts

BRRFS raid 5 Pool Data Recovery

This is a X-Post…

I’m still very new to having a home NAS. I thought it would be good to get it off of Google so I can have more control and save some money…

So I bought a Zettlab D8 Ultra and it looks like at some point while playing with settings and restarting the system (shutdown through system, and power back on) my pool became unreadable.

It seems the ZettOs uses BTRFS, and it looks like an MDADM control for Raid 5. I’ve been using Gemini to try and recover the data but it seems I’m just going in circles and I’m just wondering if I need to give up. I asked Gemini to give me a summary of what has been done. I’d like to try to recover the files so here is what Gemini told me we did (this is a straight copy paste from Gemini output). If anyone has any suggestions on what to try I’m happy to give it a shot.

——-

BTRFS on Linux Software RAID5 recovery help - Parent transid verify failed & Corrupt Chunk Tree after stripping bcache

\*\*TL;DR:\*\*

Recovering a 21TB storage pool (4x 24TB HDDs in RAID5). Originally wrapped in bcache with BTRFS on top. Bcache stripped cleanly (dirty\_data was 0.0k). BTRFS superblock is perfectly aligned via loop offset, but filesytem won't mount/restore due to a corrupt chunk tree (\`cannot read chunk root\`) and \`parent transid verify failed\` on all backup roots. Standard tools are core-dumping due to array size. Looking for advanced recovery paths.

\*\*\*

\### 1. Hardware & Environment Stack

\* \*\*Hardware:\*\* Zettlab D8 Ultra (Intel x86) with 4x 24TB HDDs.

\* \*\*OS / Environment:\*\* Ubuntu Live USB.

\* \*\*Storage Layout:\*\* Physical drives form a healthy Linux Software RAID 5 array (\`/dev/md126\` / \`/dev/md127\`). A bcache caching layer wrapper (\`bcache0\`) was mapped on top, formatted with BTRFS.

\---

\### 2. What We Have Done So Far

\#### Phase A: Removing Bcache & Finding the BTRFS Offset

Because the block layer was confirmed healthy and \`dirty\_data\` was verified at \`0.0k\` (cache fully flushed to the RAID), we opted to strip the bcache header directly from the production disks to expose the raw BTRFS pool without a massive 21TB clone image.

  1. \*\*Stopped bcache device node:\*\*

    \`\`\`bash

    echo 1 | sudo tee /sys/block/bcache0/bcache/stop

    \`\`\`

  2. \*\*Wiped bcache superblock signature:\*\*

    \`\`\`bash

    sudo wipefs -o 0x1018 /dev/md126

    \`\`\`

  3. \*\*Scanned for BTRFS magic string offset:\*\*

    \`\`\`bash

    sudo grep --only-matching --byte-offset --text "\_BHRfS\_M" /dev/md126

    \`\`\`

    \*Result:\* Found \`\_BHRfS\_M\` at byte \`73792\`. Since a BTRFS primary superblock always sits exactly 64 KiB (\`65536\` bytes) into its data partition, the math confirmed a precise structural offset of \*\*\`8192\` bytes (8 KiB)\*\* (\`73792 - 65536 = 8192\`).

  4. \*\*Mapped the exact 8 KiB offset boundary via loop device:\*\*

    \`\`\`bash

    sudo losetup --find --show --offset 8192 --sector-size 512 /dev/md127

    \`\`\`

    \*Result:\* Mapped cleanly to \`/dev/loop17\`. Running \`sudo btrfs inspect-internal dump-super /dev/loop17\` successfully decodes the filesystem parameters with zero alignment or "bad magic" errors. The block layer mapping is 100% verified.

\---

\### 3. The Core Issue: Deep Metadata Corruption

Once the loop device opened the BTRFS superblock cleanly, we attempted multiple methods to mount or pull files. Every single approach has hit a brick wall due to multi-layered metadata corruption:

\#### 1. Transaction ID Verification Failures (\`parent transid verify failed\`)

The kernel log (\`dmesg\`) reveals that the primary tree root and all four native fallback backup root slots (0, 1, 2, and 3) are corrupt. The transaction generations expected by the superblock (\`46632\` to \`46635\`) completely mismatch the actual generations found on those blocks (ranging wildly from \`47544\` to \`146281\`).

\#### 2. Broken Chunk Tree Map

We extracted the raw root addresses from the superblock layout:

\`\`\`bash

sudo btrfs inspect-internal dump-super -f /dev/loop17 | grep -A 25 -E "backup\_roots"

\`\`\`

\*Result:\* All four backup slots reference the exact same chunk root address (\`26017792\`) at generation \`46596\`. This chunk root is unreadable. Without a functioning chunk tree, BTRFS cannot translate logical virtual addresses into real, physical device segments.

\#### 3. Command Failures Encountered:

\* \`sudo mount -t btrfs -o ro,usebackuproot /dev/loop17 /mnt/recovery\` -> Fails with \`open\_ctree failed: -5\`.

\* \`sudo btrfs restore /dev/loop17 /tmp\` -> Aborts immediately with \`cannot read chunk root\`.

\* \`sudo mount -t btrfs -o ro,rescue=usebackuproot,rescue=ignorebadroots,rescue=nolog /dev/loop17 /mnt/recovery\` -> Kernel rejects the mount due to the corrupt chunk tree maps.

\* \`sudo btrfs-find-root /dev/loop17\` -> Triggers a \`Bus error (core dumped)\` because it runs out of memory or hits a kernel boundary trying to scan the massive 21TB metadata space sequentially on a Live USB environment.

\---

\### Current Status

\* The physical disks and RAID5 parity are completely healthy and active.

\* The structural block offset (\`8192\` bytes) is completely correct and verified.

\* Standard kernel mount paths and user-space recovery utilities (\`btrfs restore\`) are deadlocked by the broken chunk tree.

reddit.com
u/cewong2 — 7 days ago

BRRFS raid 5 Pool Data Recovery

I’m still very new to having a home NAS. I thought it would be good to get it off of Google so I can have more control and save some money…

So I bought a Zettlab D8 Ultra and it looks like at some point while playing with settings and restarting the system (shutdown through system, and power back on) my pool became unreadable.

It seems the ZettOs uses BTRFS, and it looks like an MDADM control for Raid 5. I’ve been using Gemini to try and recover the data but it seems I’m just going in circles and I’m just wondering if I need to give up. I asked Gemini to give me a summary of what has been done. I’d like to try to recover the files so here is what Gemini told me we did (this is a straight copy paste from Gemini output). If anyone has any suggestions on what to try I’m happy to give it a shot.

——-

BTRFS on Linux Software RAID5 recovery help - Parent transid verify failed & Corrupt Chunk Tree after stripping bcache

**TL;DR:**

Recovering a 21TB storage pool (4x 24TB HDDs in RAID5). Originally wrapped in bcache with BTRFS on top. Bcache stripped cleanly (dirty_data was 0.0k). BTRFS superblock is perfectly aligned via loop offset, but filesytem won't mount/restore due to a corrupt chunk tree (`cannot read chunk root`) and `parent transid verify failed` on all backup roots. Standard tools are core-dumping due to array size. Looking for advanced recovery paths.

***

### 1. Hardware & Environment Stack

* **Hardware:** Zettlab D8 Ultra (Intel x86) with 4x 24TB HDDs.

* **OS / Environment:** Ubuntu Live USB.

* **Storage Layout:** Physical drives form a healthy Linux Software RAID 5 array (`/dev/md126` / `/dev/md127`). A bcache caching layer wrapper (`bcache0`) was mapped on top, formatted with BTRFS.

---

### 2. What We Have Done So Far

#### Phase A: Removing Bcache & Finding the BTRFS Offset

Because the block layer was confirmed healthy and `dirty_data` was verified at `0.0k` (cache fully flushed to the RAID), we opted to strip the bcache header directly from the production disks to expose the raw BTRFS pool without a massive 21TB clone image.

  1. **Stopped bcache device node:**

    ```bash

    echo 1 | sudo tee /sys/block/bcache0/bcache/stop

    ```

  2. **Wiped bcache superblock signature:**

    ```bash

    sudo wipefs -o 0x1018 /dev/md126

    ```

  3. **Scanned for BTRFS magic string offset:**

    ```bash

    sudo grep --only-matching --byte-offset --text "_BHRfS_M" /dev/md126

    ```

    *Result:* Found `_BHRfS_M` at byte `73792`. Since a BTRFS primary superblock always sits exactly 64 KiB (`65536` bytes) into its data partition, the math confirmed a precise structural offset of **`8192` bytes (8 KiB)** (`73792 - 65536 = 8192`).

  4. **Mapped the exact 8 KiB offset boundary via loop device:**

    ```bash

    sudo losetup --find --show --offset 8192 --sector-size 512 /dev/md127

    ```

    *Result:* Mapped cleanly to `/dev/loop17`. Running `sudo btrfs inspect-internal dump-super /dev/loop17` successfully decodes the filesystem parameters with zero alignment or "bad magic" errors. The block layer mapping is 100% verified.

---

### 3. The Core Issue: Deep Metadata Corruption

Once the loop device opened the BTRFS superblock cleanly, we attempted multiple methods to mount or pull files. Every single approach has hit a brick wall due to multi-layered metadata corruption:

#### 1. Transaction ID Verification Failures (`parent transid verify failed`)

The kernel log (`dmesg`) reveals that the primary tree root and all four native fallback backup root slots (0, 1, 2, and 3) are corrupt. The transaction generations expected by the superblock (`46632` to `46635`) completely mismatch the actual generations found on those blocks (ranging wildly from `47544` to `146281`).

#### 2. Broken Chunk Tree Map

We extracted the raw root addresses from the superblock layout:

```bash

sudo btrfs inspect-internal dump-super -f /dev/loop17 | grep -A 25 -E "backup_roots"

```

*Result:* All four backup slots reference the exact same chunk root address (`26017792`) at generation `46596`. This chunk root is unreadable. Without a functioning chunk tree, BTRFS cannot translate logical virtual addresses into real, physical device segments.

#### 3. Command Failures Encountered:

* `sudo mount -t btrfs -o ro,usebackuproot /dev/loop17 /mnt/recovery` -> Fails with `open_ctree failed: -5`.

* `sudo btrfs restore /dev/loop17 /tmp` -> Aborts immediately with `cannot read chunk root`.

* `sudo mount -t btrfs -o ro,rescue=usebackuproot,rescue=ignorebadroots,rescue=nolog /dev/loop17 /mnt/recovery` -> Kernel rejects the mount due to the corrupt chunk tree maps.

* `sudo btrfs-find-root /dev/loop17` -> Triggers a `Bus error (core dumped)` because it runs out of memory or hits a kernel boundary trying to scan the massive 21TB metadata space sequentially on a Live USB environment.

---

### Current Status

* The physical disks and RAID5 parity are completely healthy and active.

* The structural block offset (`8192` bytes) is completely correct and verified.

* Standard kernel mount paths and user-space recovery utilities (`btrfs restore`) are deadlocked by the broken chunk tree.

reddit.com
u/cewong2 — 7 days ago