Using Multiple AI Agents To Audit Each Other
I wanted to share a workflow I have been using when working with AI-generated physics ideas, especially for people trying to turn observations or conceptual models into something closer to a scientific paper.
A common mistake I see is depending on one AI engine for everything: gathering background, shaping the idea, writing the argument, reviewing the logic, and polishing the final wording. The problem is that the same engine that helps make the draft sound coherent can also hide weak assumptions, overstate confidence, miss technical issues, or smooth over gaps in the reasoning.
My workflow is different.
I try to separate the AI roles:
- One engine for gathering background, terminology, references, and existing literature.
- One or two engines for helping craft the idea into a structured argument.
- One deliberately critical or “non-pleasing” engine for auditing the result.
- Additional engines for final review, mainly to catch hidden mistakes, unclear wording, or scientific overclaims.
The most useful part is the closed feedback loop. For example, I may use one engine to audit and correct the way another engine drafted my idea. Then I take that feedback back to the drafting engine and ask it to revise. After that, I may consult other engines such as DeepSeek, Gemini, or Grok to look for hidden scientific problems or weak wording.
The point is not that multiple AI engines magically produce truth. They do not. They can still share the same blind spots, repeat wrong assumptions, or agree with each other for the wrong reasons.
The point is that role separation helps.
A drafting engine is good at coherence. An auditing engine is good at pressure-testing. A different model may notice a weakness the first two missed. The human still has to judge the final result.
For AI-generated physics work, I think this distinction is important:
AI should not only be used to write. It should also be used to attack the writing.
Before publishing or sharing any AI-assisted scientific text, I think we should ask:
- Did another model try to falsify the argument?
- Did a critical model check the assumptions?
- Were equations, dimensions, and claims independently reviewed?
- Were references checked rather than only generated?
- Did the final version become more cautious after review?
- Is the use of AI transparent?
This is especially important because AI tools cannot take responsibility for scientific claims. The responsibility remains with the human author or researcher. AI can assist, but it should not replace verification.
My suggested workflow:
Idea → Gathering AI → Drafting AI → Critical Audit AI → Revision → External Model Review → Human Final Check
For this subreddit, I think it would be useful if people sharing AI-generated physics papers or theories also shared a short “AI audit trail,” for example:
- Which model drafted the idea?
- Which model reviewed it?
- What major criticism was found?
- What was changed after the criticism?
- What claims remain uncertain?
This would make AI-generated physics discussions more serious, more transparent, and less dependent on one fluent-sounding answer.
Curious to hear how others here are using multiple models. Are you using AI only as a writer, or also as a reviewer?