Sanity check: Embedded Linux storage architecture for a remote device (A/B updates, OverlayFS, strict RO RootFS)
Hey folks,
I'm working on the OS architecture for an ultra-remote, autonomous gateway device. Once it is deployed, physical access is no more possible and communication bandwidth is quite low.
We use Yocto to build our BSP. I'd love to get a sanity check from the community on our storage and filesystem architecture before we lock it in.
Here is the rundown of our approach:
1. Hardware & Boot Hierarchy We have an external hardware MCU that controls the boot pins to provide a 3-tier failsafe:
- Tier 1 (Golden Rescue): QSPI Flash. Strictly read-only monolithic image (bootloader, minimal kernel, initramfs). Only booted if block devices completely fail.
- Tier 2 (Primary Prod): eMMC.
- Tier 3 (Dev/Secondary Fallback): SD Card.
2. Partition Layout Both the eMMC and SD card use an identical 4-partition block layout:
BOOT(FAT32)RootFS-A(EXT4)RootFS-B(EXT4)Data(EXT4 - persistent storage for logs/payload data)
3. Filesystem Permissions & State Management
- Production:
RootFS-AandRootFS-Bare strictly Read-Only by default. (The inactive RootFS slot and the BOOT partition only become temporarily writable during an OTA update). - Development: To keep engineering velocity high, we tweak the kernel bootargs via the U-Boot console to mount the active RootFS as Read-Write for local testing and application/library deployment.
- Volatile Data:
/varand/tmpare mounted to RAM (tmpfs) to save flash wear. Critical post-mortem crash logs are explicitly written to theDatapartition before a watchdog reboot. - Persistent State: We use OverlayFS for paths like
/etcand/home. Theupperdirlives on theDatapartition of the currently active boot medium.
4. Mitigating A/B Update Configuration Drift Because we rely on Delta OTAs (due to the narrow bandwidth), we ran into the classic OverlayFS trap: if Slot B boots a newly updated app, it might read an outdated configuration schema left behind in the /etc overlay by Slot A.
- Our Fix: We enforce schema versioning in the directory structure itself. Apps read their configs from paths like
/etc/myorg/app/v2.1.0/config.yaml. This allows old and new schemas to safely coexist in the persistent overlay.
My questions for the community:
- Are there any hidden traps with the OverlayFS
upperdirliving on anext4partition that is susceptible to sudden power loss, assuming we mount it with aggressivefsckauto-repair flags? - Is bypassing the RO RootFS via U-Boot for development a common practice, or are we asking for Dev/Prod parity trouble down the line?
- Does anyone see a glaring flaw in how we are handling the A/B configuration drift using versioned directory paths?
Appreciate any ruthless critiques or advice you can offer!