u/OriginalBeginning708

Everyone’s talking about harnesses this year, but every example is code — files, lint, tests, diffs, LSP. The harness is doing half the work; same model, same prompt, wildly different results depending on what’s around it.

I work in consulting and I keep thinking: we don’t actually need smarter models. Frontier-level reasoning is already overkill for most knowledge work. What we’re missing is the harness.

But “harness for knowledge work” is harder to picture. The substrate isn’t code, it’s claims + evidence + argument. So what would the equivalents be?

•	Linting = sources resolve, terms consistent, numbers reconcile, citation actually says what you claim it does  
•	Tests = adversarial reads, steelman the opposite, invert the recommendation  
•	Diffs = at the claim level, not the prose level (“what changed in the thinking”)  
•	Compile = same substrate, different audience-specific outputs  
•	Debug = trace any sentence in the deliverable back to its evidence

My instinct keeps pulling toward graphs (claim graphs, argument graphs), but I’m suspicious of that — code lives in files and derives graphs when useful, not the other way round. Maybe knowledge work is the same: disciplined text, graph as a view.

Two questions:

1.	Is anyone actually building harnesses for non-code use cases? Consulting, legal, research, policy?  
2.	Am I wrong that this is where the value is, vs. waiting for the next model?

Genuinely want to be argued with

Is “harness engineering” only a coding thing? What does a harness for knowledge work look like?