I Compared Citations Across 3 AI Models on 150 Queries — Only 8% Agreement. Is Anyone Tracking This?
Here's something that genuinely surprised me.
I ran the same 150 informational queries across ChatGPT, Gemini, and Perplexity over a two-week period. The question was simple: how often do all three models cite the same source for the same query?
The answer: 8%.
Twelve percent of queries had two models agreeing on at least one source. The remaining 80%? Every model cited something completely different.
A few patterns stood out that I wanted to share:
**ChatGPT** leaned heavily toward established publishers — major news sites, university domains, Wikipedia. It played it safe. About 65% of its citations came from domains with 10+ million monthly visitors.
**Gemini** was the most eclectic. It cited small blogs, niche forums, and individual Substack writers at rates I didn't expect. Roughly 30% of its sources would never appear in a ChatGPT answer for the same query.
**Perplexity** sat somewhere in between but had a clear preference for recent content — 58% of its citations were from pages updated within the last 90 days. The other two models didn't show that recency bias nearly as strongly.
What this means practically: if you're optimizing for AI citations, picking a single model to target is a real strategy. The overlap is so low that optimizing for one model almost certainly leaves the other two untouched.
But here's where I'm stuck and genuinely curious what others think:
Is it better to optimize specifically for one model's preferences and dominate there, or spread your efforts thin trying to appeal to all three? I've seen solid arguments for both approaches, but I haven't found anyone actually tracking the ROI comparison.
Anyone else measuring cross-model citation overlap? What are you seeing?