u/Every_Prior7165

I built a tiny Codex harness for scoring runs instead of just vibes
▲ 2 r/codex+1 crossposts

I built a tiny Codex harness for scoring runs instead of just vibes

/goal is great for keeping Codex moving, but I wanted something slightly different: a repo-local way to define the task, scorer, constraints, stop condition, and run history before the loop starts.

The missing piece for me was not “continue until done.”

It was:

- what are we optimizing for?
- what score did this run actually get?
- is this attempt better than the previous one?
- what should count as the current frontier?

So I started building `oh-my-darwin`, a small CLI wrapper around Codex.

It’s heavily inspired by `oh-my-codex`: I liked the idea of turning Codex from a one-shot coding assistant into a more structured workflow system, but wanted a tiny repo-local harness focused specifically on scoring, baselines, and iterative improvement.

Current flow:

- `darwin init` interviews you and writes a repo-local `.darwin/meta-spec.md`
- `darwin baseline` runs the task once and records the realized score
- every Codex lifecycle event gets logged to `.darwin/events.jsonl`
- attempts get tracked in `.darwin/evolution.jsonl` and `.darwin/frontier.json`

The idea is to turn:

> “Codex, improve this thing”

into:

> task + scorer + constraints + human review pattern + stop condition + run history

Quick example:

```bash
npm install
npm link

cd ~/your-project
darwin init
darwin baseline
```

The actual repeated `darwin meta` loop is still WIP, so this is early. Right now it’s mostly the on-ramp: spec generation, baseline capture, hook logging, and local run history.

Repo:
https://github.com/clarence-lee-sheng/oh-my-darwin

Would be useful to hear from other Codex power users: what would you want a scoring/frontier loop to track before trusting it on real repo work?

u/Every_Prior7165 — 7 days ago