u/cdokme

Sanity check: Embedded Linux storage architecture for a remote device (A/B updates, OverlayFS, strict RO RootFS)

Hey folks,

I'm working on the OS architecture for an ultra-remote, autonomous gateway device. Once it is deployed, physical access is no more possible and communication bandwidth is quite low.

We use Yocto to build our BSP. I'd love to get a sanity check from the community on our storage and filesystem architecture before we lock it in.

Here is the rundown of our approach:

1. Hardware & Boot Hierarchy We have an external hardware MCU that controls the boot pins to provide a 3-tier failsafe:

  • Tier 1 (Golden Rescue): QSPI Flash. Strictly read-only monolithic image (bootloader, minimal kernel, initramfs). Only booted if block devices completely fail.
  • Tier 2 (Primary Prod): eMMC.
  • Tier 3 (Dev/Secondary Fallback): SD Card.

2. Partition Layout Both the eMMC and SD card use an identical 4-partition block layout:

  1. BOOT (FAT32)
  2. RootFS-A (EXT4)
  3. RootFS-B (EXT4)
  4. Data (EXT4 - persistent storage for logs/payload data)

3. Filesystem Permissions & State Management

  • Production: RootFS-A and RootFS-B are strictly Read-Only by default. (The inactive RootFS slot and the BOOT partition only become temporarily writable during an OTA update).
  • Development: To keep engineering velocity high, we tweak the kernel bootargs via the U-Boot console to mount the active RootFS as Read-Write for local testing and application/library deployment.
  • Volatile Data: /var and /tmp are mounted to RAM (tmpfs) to save flash wear. Critical post-mortem crash logs are explicitly written to the Data partition before a watchdog reboot.
  • Persistent State: We use OverlayFS for paths like /etc and /home. The upperdir lives on the Data partition of the currently active boot medium.

4. Mitigating A/B Update Configuration Drift Because we rely on Delta OTAs (due to the narrow bandwidth), we ran into the classic OverlayFS trap: if Slot B boots a newly updated app, it might read an outdated configuration schema left behind in the /etc overlay by Slot A.

  • Our Fix: We enforce schema versioning in the directory structure itself. Apps read their configs from paths like /etc/myorg/app/v2.1.0/config.yaml. This allows old and new schemas to safely coexist in the persistent overlay.

My questions for the community:

  1. Are there any hidden traps with the OverlayFS upperdir living on an ext4 partition that is susceptible to sudden power loss, assuming we mount it with aggressive fsck auto-repair flags?
  2. Is bypassing the RO RootFS via U-Boot for development a common practice, or are we asking for Dev/Prod parity trouble down the line?
  3. Does anyone see a glaring flaw in how we are handling the A/B configuration drift using versioned directory paths?

Appreciate any ruthless critiques or advice you can offer!

reddit.com
u/cdokme — 7 days ago