u/JustProcedure4155

I’ve been experimenting with separating “planning” from “execution” in computer-use agents.

Instead of letting a frontier model handle every tiny UI action directly, I split it into:

The surprising part is how much better this feels in practice.

A lot of computer-use workflows don’t actually need expensive reasoning for every step:

Those tasks mostly need:

The frontier model is mainly useful for:

So instead of burning tokens on tiny execution details, I let Spark handle the execution layer.

Current experiments:

I recorded comparison videos between:

direct Codex 5.5 computer use (It may look frozen, but GPT-5.5 is actually thinking very hard.)

The speed difference is pretty interesting.