u/2thick2fly

I'm building an locally run application that integrates with coding assistants.

So far I've worked with Codex and Copilot. Claude Code and Gemini are next, once I get to a stable solution with the first two.

Right now I'm interfacing with Codex through the CLI, specifically with:

codex exec -json -output-last-message "prompt e.g. modify file x by adding Y or run z test"

And with Copilot through:

copilot -model gpt-5.4 -output-format json "prompt e.g. modify file x by adding y"

I'm considering switching the Copilot side to ACP, but I haven't looked into that properly yet.

Afterwards, my application needs to read the output without using Al and parse it into a report. I'm also considering reading the session data. The goal is to eventually make a deterministic judgment about whether the coding agent actually did what it was supposed to do (e.g. modify files) to take a decision on the next step based on a decision tree. It is also imperative to read any tool failures or errors or warnings.

The part I'm unsure about is that this approach (reading the cli output) feels a bit dirty and cowboy-is. My instinct says that it is not the robust way of doing it and I need this part of my software to be spot on and the assessment to be very reliable and deterministic. Driving the tools through CLI output parsing does not feel like the cleanest long-term solution.

Has anyone found a better approach for this?

PS: I am specifically looking right now for being able to read the metadata for any errors, tools failures, tool invocations, etc

I'm not sure this is the right subreddit to post this, but it's the best I know, so let's give it a shot 😊

I'm building an application that runs locally and integrates with coding assistants.