Container → microVM is not the finish line. Your isolation boundary is not in the Guest kernel. It's in that root process on your host called virtiofsd.
1. Everyone just moved house
For the past six months, every vendor still serious about agent sandboxes has been telling the same story:
Shared kernels are over. We've upgraded to Firecracker / Kata / Cloud Hypervisor. Each tenant gets its own Guest kernel = hardware-level isolation = safe.*
That story is more honest than the shared-kernel one. That's it.
E2B prints "Firecracker" on the homepage. Modal blogs about gVisor. Kata is the silver bullet of the K8s crowd. 90ms cold start, written in Rust, 5 MiB memory overhead. Sounds airtight.
Until you ps aux | grep -E '(virtiofsd|vhost)' on the host.
2. virtiofsd: the root daemon sitting next door
To let the Guest reach host volumes at near-native speed, the standard microVM stack runs a daemon on the host called virtiofsd, wired to the Guest over the virtio-fs channel. What permissions does it have?
Host root.
Not a misconfiguration — by design. It has to act on the host filesystem on the Guest's behalf.
USENIX Security '23 gave this an unflattering name: Operation Forwarding Attacks.
Some Guest syscalls get forwarded to that high-privileged proxy on the host for execution. Physical isolation? Sidestepped.
CVE-2022-0358 walked it through end-to-end: a plain open() from inside the container is forwarded across virtio to virtiofsd, which then bypasses the host's inode_init_owner() check and writes a file with root SGID into a shared host directory.
Container root → host root. The hardware boundary of the MicroVM was never crossed. It was flanked.
3. It's not just virtiofsd
| Forwarding surface | Attack shape | Measured impact |
|---|---|---|
virtiofsd (file) |
Daemon privilege abuse | Container → host root (CVE-2022-0358) |
virtio-blk (storage) |
I/O amplification | Co-located neighbor I/O drops 93.4% |
virtio-net (network) |
Packet-parse amplification | Host kernel nf_conntrack table fills instantly |
vhost-net / KVM PIT worker threads |
cgroup attribution missing | Guest borrows host kernel-thread cycles, bypasses vCPU quota |
Same shape every row: the physical boundary is fine, the operation-forwarding pipes either side of it are not.
Each pipe has a host-side proxy: a daemon, the VMM main process, a host kernel thread. Each proxy is more privileged than anything in the Guest. All the Guest needs is to make the proxy do something on its behalf — and now it speaks with the proxy's voice.
Upgrading to MicroVM doesn't make these proxies disappear. It moves them from "kernel namespace bookkeeping" to "a row of root daemons in host userspace." The attack surface didn't vanish. It moved.
4. The industry answer is "nest one more layer"
- vhost-user offload: peel virtual devices out of the VMM main process, run them as isolated low-privilege daemons.
- Reverse user namespace: use a user namespace to strip virtiofsd of real host root before letting it serve the Guest.
- Jailer: lock the VMM into chroot + cgroups + tight seccomp (Firecracker's Jailer allows just 24 syscalls and 30 ioctls).
- Matryoshka: bare metal → Jailer-wrapped VMM → ephemeral Guest kernel → OCI container inside Guest → agent code inside container. Every layer distrusts the next.
This works. The cost: you now have N more long-lived host daemons to audit, patch, and authorize. Every nesting layer adds another permanent privileged process to the host inventory.
So i guess we need a different way for the agent run in the sandbox. What proposal do you have?