Is “harness engineering” only a coding thing? What does a harness for knowledge work look like?
Everyone’s talking about harnesses this year, but every example is code — files, lint, tests, diffs, LSP. The harness is doing half the work; same model, same prompt, wildly different results depending on what’s around it.
I work in consulting and I keep thinking: we don’t actually need smarter models. Frontier-level reasoning is already overkill for most knowledge work. What we’re missing is the harness.
But “harness for knowledge work” is harder to picture. The substrate isn’t code, it’s claims + evidence + argument. So what would the equivalents be?
• Linting = sources resolve, terms consistent, numbers reconcile, citation actually says what you claim it does
• Tests = adversarial reads, steelman the opposite, invert the recommendation
• Diffs = at the claim level, not the prose level (“what changed in the thinking”)
• Compile = same substrate, different audience-specific outputs
• Debug = trace any sentence in the deliverable back to its evidence
My instinct keeps pulling toward graphs (claim graphs, argument graphs), but I’m suspicious of that — code lives in files and derives graphs when useful, not the other way round. Maybe knowledge work is the same: disciplined text, graph as a view.
Two questions:
1. Is anyone actually building harnesses for non-code use cases? Consulting, legal, research, policy?
2. Am I wrong that this is where the value is, vs. waiting for the next model?
Genuinely want to be argued with