Notes on building a deterministic FSM runtime for LLM agents
Most AI agent runtimes currently follow the same execution pattern:
LLM -> tool call -> runtime executes side-effect
That works reasonably well for read-only tasks. But once agents start mutating external state (payments, databases, infrastructure, PII), the execution model becomes difficult to reason about operationally.
While preparing some of our internal agents, we ended up separating reasoning from execution authority entirely.
We built nano-vm: a deterministic FSM runtime where:
- the model proposes actions,
- but the runtime controls state transitions and side-effects.
The runtime enforces:
- finite execution graphs,
- compile-time step ordering,
- capability-gated tools,
- replay/idempotency boundaries,
- append-only audit history.
One design choice that turned out important:
the policy layer is intentionally less expressive than Python.
We removed eval-style execution entirely and constrained policies to a small deterministic AST subset:
- simple operators,
- no loops,
- no system calls.
That limitation simplified auditability and removed several classes of runtime behavior we did not want in financial-style workflows.
To test failure semantics, we added a Sabotage Mode with several adversarial cases:
- unauthorized tool injection,
- replay attempts,
- hash corruption,
- skipped transitions.
The most useful property operationally so far has probably been deterministic replay boundaries around side-effects.
We also had to deal with an awkward compliance problem:
preserving immutable audit chains while supporting GDPR-style erasure requests.
Our current approach replaces vault references with tombstones while preserving hash continuity and referential integrity.
I'm mostly curious how others are handling execution authority in stateful agent systems.
Are you letting the model directly drive side-effects, or inserting a deterministic control layer in between?
I'll drop the GitHub links to the core runtime and MCP layer in the comments if anyone wants to look at the implementation.