I built a Goodhart-proof AI coding agent that runs locally on 4GB VRAM. It physically cannot see your tests.
▲ 27 r/AISystemsEngineering+6 crossposts

I built a Goodhart-proof AI coding agent that runs locally on 4GB VRAM. It physically cannot see your tests.

I've been researching how AI coding agents inevitably optimize for metric-passing rather than problem-solving (Goodhart's Law). Commercial tools rely on prompt engineering and post-hoc review, but these are disciplinary, not architectural.

I built an open-source 4-layer pipeline (Planning → Execution → Verification → Optimization) where information asymmetry is enforced via strict TypedDict contracts and LangGraph state isolation: • The execution agent never receives acceptance criteria, unit tests, or the verification rubric. • Verification is blind: it evaluates git diffs without author identity or original prompt context. • Retry feedback is sanitized to abstract guidance only (prevents rubric memorization). • Neo4j graph analysis replaces context-window stuffing with precise AST dependency mapping.

Results: 26s/feature, $0.03 cost (local 3B model execution + API reasoning), reproducible benchmarks. Open-source under MIT.

Repo: https://github.com/illyar80/developer-farm

I'm particularly interested in feedback on:

  1. Formal verification approaches to guarantee isolation properties
  2. Multi-model fallback strategies for the execution layer
  3. Benchmarking frameworks for "Goodhart-resistance" in autonomous agents

Would appreciate critiques and suggestions from folks working on AI alignment, evaluation, or agentic systems.

u/illyar80 — 6 days ago