
5 ways AI drafts break specifically in non-fiction (and the structural gates I shipped for each)
Been building a desktop AI editorial pipeline (BookForge, at Synaptrix AI) for the past year and running it across a lot of AI-drafted non-fiction — biographies, technical books, narrative non-fiction, memoir with research, business books. The same five failure patterns kept showing up at the late editorial passes, and they're meaningfully different from the patterns that show up in fiction. Posting them in case they're useful, and curious whether your list overlaps.
- Hallucinated citations look correct enough to pass a 10-second skim.** This is THE non-fiction-specific failure mode. AI will write a footnote like *"Smith, 2019, page 47, The Theory of X"* — fully plausible, fully fabricated. The only fix that's actually worked for us is to take citation generation away from the model entirely. Drafting passes write opaque tokens — `[[claim:source-id]]` — inline against a source library you maintain. A separate deterministic step (pure code, no model call) walks the manuscript, resolves each token against the library, rewrites them as numbered footnotes, builds the bibliography. If a token has no match, the run fails loudly with the file and line of the unresolved token. Footnotes can't be invented; only the placement of a `[[claim:id]]` in prose can be wrong, and that's caught by fact-check. Whether your bibliography is real becomes a function of code, not generative grace.
- Source conflation across an extended project.** Smith-2019 and Smith-2024 get confused mid-paragraph. A point Karpathy made in his 2024 essay gets attributed to his 2022 talk. AI is locally consistent inside a paragraph and globally sloppy across 100,000 words. The mechanic: a canonical references store where every source (and every character, place, claim, concept) has a stable ID and the audit chain reads against it on every chapter pass. The fact-checker doesn't generate citations from memory — it checks claims against what's in the library. Errors don't get caught generatively; they get caught by structural cross-reference.
- The thesis drifts across chapters in ways fiction's plot doesn't.** Fiction has scene-by-scene structure readers anchor on; non-fiction has an argument that has to compound. AI drafts often start strong on the thesis in chapter 1, restate it slightly differently in chapter 4, contradict it implicitly in chapter 9, and pretend they agreed all along by the conclusion. The mechanic: the thesis from Discovery gets stored in the canonical references and re-injected into every chapter's audit pass; the supervisor pass explicitly flags claims that contradict it. Not perfect — argumentative drift is genuinely the hardest of the five — but it catches the egregious cases, and "egregious cases" is where most non-fiction loses its credibility with reviewers.
- Bibliography style drift across 100+ entries.** A long non-fiction project ends up with a bibliography that has to be format-consistent: same citation style, dates in the same shape, page ranges separated the same way, *et al.* vs. full author lists handled uniformly, ibid./op. cit. discipline if applicable. AI handles individual entries fine; it doesn't enforce consistency across 100. The fix: because the bibliography is built deterministically from the source library at the end (one step, one template per style), the formatting is enforced by code at compile time, not by the model during drafting. Every entry uses the same template. Consistency by construction. Chicago Notes-Bibliography is the default we use for trade non-fiction; the style choice is a config flip, not a re-render.
- The fact-check pass is the one nobody runs (and it's the load-bearing one for non-fiction).** Authors trust the draft + a copyedit. The fact-check pass — the one that re-reads every claim against the source library, flags hand-wavy attribution, flags missing citation tokens on quantitative claims, flags expert paraphrases that don't match the source — is the one that's most replaceable by an AI step and the one most often skipped in DIY pipelines. We made it mandatory in the chain. It consistently surfaces 15-30 things per book that a tired author would have missed and a reader (or, worse, a reviewer with subject expertise) would have noticed.
Two things I'd love takes on:
For non-fiction authors with research-heavy projects (50+ sources, 100+ footnotes):** what's your current workflow for source provenance? Zotero / Citavi / Mendeley / a spreadsheet / a folder of PDFs with hand-typed citations? Curious where the friction is for you, because I think the unaddressed pain in citation tooling is bigger than tooling vendors realise.
What pattern am I missing?** Five was the convergent set after a year of pipeline runs across a few dozen test projects. I'm sure non-fiction has more. What do you keep catching that I haven't?
Disclosure: BookForge is a desktop app currently in free open beta. Runs on your own Claude.ai subscription or Anthropic API key. Manuscript stays on your disk — we never see it. I'm one of the people building it at Synaptrix AI. Windows today; macOS / Linux in active development.