Running three agents via Codex: Main (GTD, crons, morning ritual, routing), Mentor (advice, journaling), and Couples (separate WhatsApp group). Main is the only entry point; Mentor has no direct channel.

The problem

Main is dropping balls: losing list items, mis-routing, marking wrong tasks done. Mentor is worse: same model, but answers routed through it come out dramatically thinner than asking directly.

Concrete example from this morning: I described feeling wired and restless, couldn't meditate, wanted to be present with my partner. Mentor gave me two lines of surface coaching. Same prompt to raw ChatGPT returned a proper diagnosis (late caffeine, blocked deep sleep, dopamine spike) plus a 5-step protocol. The Mentor reply isn't wrong, it's just shallow when I needed a real answer.

The core question

Is this an architecture problem (Main overloaded, context polluted, routing adding noise) or a prompt and orchestration problem that splitting agents won't fix?

Genuinely don't know. One direction I'm considering: make Main a thin stateless router and give each specialist a tighter prompt and narrower tool surface. But I'm not sold on it yet; it might just be that my prompts are weak.

Questions

Anyone running router + specialists in OpenClaw? Pitfalls to watch for (double-hop latency, context bleed, tool-call failures across agents)?
How do you handle shared state across specialists? E.g. an advice agent that needs open GTD next-actions, or a health agent that needs calendar context.
Anyone seeing quality drops routing through Codex vs. hitting the model directly? Same model, same prompt, noticeably worse output through the framework.
For advice-only agents, is framework overhead actually costing answer quality, or is it almost always a prompt problem?

Happy to share my Main system prompt in the comments.

u/Annual_Elderberry541

Should I split Main into a thin router + specialist agents? (architecture sanity check)

The problem

The core question

Questions