
▲ 20 r/codex
Computer Use + Codex Spark feels kind of insane
I’ve been experimenting with separating “planning” from “execution” in computer-use agents.
Instead of letting a frontier model handle every tiny UI action directly, I split it into:
- Codex 5.5 → planning / judgment
- Codex Spark → actual execution
The surprising part is how much better this feels in practice.
A lot of computer-use workflows don’t actually need expensive reasoning for every step:
- scrolling
- clicking buttons
- navigating pages
- repetitive UI interactions
- QA loops
- E2E-style testing
Those tasks mostly need:
- speed
- determinism
- low latency
The frontier model is mainly useful for:
- deciding what to do
- recovering from failures
- understanding intent
- planning multi-step workflows
So instead of burning tokens on tiny execution details, I let Spark handle the execution layer.
Current experiments:
- automated QA
- browser-driven implementation
- E2E testing
- repetitive UI workflows
- computer-use orchestration
I recorded comparison videos between:
- direct Codex 5.5 computer use (It may look frozen, but GPT-5.5 is actually thinking very hard.)
https://reddit.com/link/1t7vme0/video/sb29uu01q10h1/player
- planner/executor split with Spark
https://reddit.com/link/1t7vme0/video/89yhsan3q10h1/player
The speed difference is pretty interesting.
u/JustProcedure4155 — 14 days ago