
I built a tiny Codex harness for scoring runs instead of just vibes
/goal is great for keeping Codex moving, but I wanted something slightly different: a repo-local way to define the task, scorer, constraints, stop condition, and run history before the loop starts.
The missing piece for me was not “continue until done.”
It was:
- what are we optimizing for?
- what score did this run actually get?
- is this attempt better than the previous one?
- what should count as the current frontier?
So I started building `oh-my-darwin`, a small CLI wrapper around Codex.
It’s heavily inspired by `oh-my-codex`: I liked the idea of turning Codex from a one-shot coding assistant into a more structured workflow system, but wanted a tiny repo-local harness focused specifically on scoring, baselines, and iterative improvement.
Current flow:
- `darwin init` interviews you and writes a repo-local `.darwin/meta-spec.md`
- `darwin baseline` runs the task once and records the realized score
- every Codex lifecycle event gets logged to `.darwin/events.jsonl`
- attempts get tracked in `.darwin/evolution.jsonl` and `.darwin/frontier.json`
The idea is to turn:
> “Codex, improve this thing”
into:
> task + scorer + constraints + human review pattern + stop condition + run history
Quick example:
```bash
npm install
npm link
cd ~/your-project
darwin init
darwin baseline
```
The actual repeated `darwin meta` loop is still WIP, so this is early. Right now it’s mostly the on-ramp: spec generation, baseline capture, hook logging, and local run history.
Repo:
https://github.com/clarence-lee-sheng/oh-my-darwin
Would be useful to hear from other Codex power users: what would you want a scoring/frontier loop to track before trusting it on real repo work?