u/JustProcedure4155

Computer Use + Codex Spark feels kind of insane
▲ 20 r/codex

Computer Use + Codex Spark feels kind of insane

I’ve been experimenting with separating “planning” from “execution” in computer-use agents.

Instead of letting a frontier model handle every tiny UI action directly, I split it into:

  • Codex 5.5 → planning / judgment
  • Codex Spark → actual execution

The surprising part is how much better this feels in practice.

A lot of computer-use workflows don’t actually need expensive reasoning for every step:

  • scrolling
  • clicking buttons
  • navigating pages
  • repetitive UI interactions
  • QA loops
  • E2E-style testing

Those tasks mostly need:

  • speed
  • determinism
  • low latency

The frontier model is mainly useful for:

  • deciding what to do
  • recovering from failures
  • understanding intent
  • planning multi-step workflows

So instead of burning tokens on tiny execution details, I let Spark handle the execution layer.

Current experiments:

  • automated QA
  • browser-driven implementation
  • E2E testing
  • repetitive UI workflows
  • computer-use orchestration

I recorded comparison videos between:

  1. direct Codex 5.5 computer use (It may look frozen, but GPT-5.5 is actually thinking very hard.)

https://reddit.com/link/1t7vme0/video/sb29uu01q10h1/player

  1. planner/executor split with Spark

https://reddit.com/link/1t7vme0/video/89yhsan3q10h1/player

The speed difference is pretty interesting.

Repo:
https://github.com/KingGyuSuh/awesome-codex-spark

reddit.com
u/JustProcedure4155 — 14 days ago