u/cdokme

Hey folks,

I'm working on the OS architecture for an ultra-remote, autonomous gateway device. Once it is deployed, physical access is no more possible and communication bandwidth is quite low.

We use Yocto to build our BSP. I'd love to get a sanity check from the community on our storage and filesystem architecture before we lock it in.

Here is the rundown of our approach:

1. Hardware & Boot Hierarchy We have an external hardware MCU that controls the boot pins to provide a 3-tier failsafe:

Tier 1 (Golden Rescue): QSPI Flash. Strictly read-only monolithic image (bootloader, minimal kernel, initramfs). Only booted if block devices completely fail.
Tier 2 (Primary Prod): eMMC.
Tier 3 (Dev/Secondary Fallback): SD Card.

2. Partition Layout Both the eMMC and SD card use an identical 4-partition block layout:

BOOT (FAT32)
RootFS-A (EXT4)
RootFS-B (EXT4)
Data (EXT4 - persistent storage for logs/payload data)

3. Filesystem Permissions & State Management

Production: RootFS-A and RootFS-B are strictly Read-Only by default. (The inactive RootFS slot and the BOOT partition only become temporarily writable during an OTA update).
Development: To keep engineering velocity high, we tweak the kernel bootargs via the U-Boot console to mount the active RootFS as Read-Write for local testing and application/library deployment.
Volatile Data: /var and /tmp are mounted to RAM (tmpfs) to save flash wear. Critical post-mortem crash logs are explicitly written to the Data partition before a watchdog reboot.
Persistent State: We use OverlayFS for paths like /etc and /home. The upperdir lives on the Data partition of the currently active boot medium.

4. Mitigating A/B Update Configuration Drift Because we rely on Delta OTAs (due to the narrow bandwidth), we ran into the classic OverlayFS trap: if Slot B boots a newly updated app, it might read an outdated configuration schema left behind in the /etc overlay by Slot A.

Our Fix: We enforce schema versioning in the directory structure itself. Apps read their configs from paths like /etc/myorg/app/v2.1.0/config.yaml. This allows old and new schemas to safely coexist in the persistent overlay.

My questions for the community:

Are there any hidden traps with the OverlayFS upperdir living on an ext4 partition that is susceptible to sudden power loss, assuming we mount it with aggressive fsck auto-repair flags?
Is bypassing the RO RootFS via U-Boot for development a common practice, or are we asking for Dev/Prod parity trouble down the line?
Does anyone see a glaring flaw in how we are handling the A/B configuration drift using versioned directory paths?

Appreciate any ruthless critiques or advice you can offer!

Sanity check: Embedded Linux storage architecture for a remote device (A/B updates, OverlayFS, strict RO RootFS)