Why do AI tools mostly cite old Reddit threads?
Has anyone else noticed that most Reddit links cited by LLMs tend to be surprisingly old?
I was reading an article recently about how Reddit is aggressively blocking AI crawlers from accessing its content through robots.txt. At the same time, it’s well known that Reddit has a direct agreement with Google and Open AI, where they have direct access to Redddit through their API rather than relying purely on standard crawling mechanisms.
But this made me wonder about something interesting regarding LLMs.
When you look at Reddit links surfaced inside AI answers, a very common pattern is that many of the cited threads are relatively old, often two or three years old, and only rarely very recent discussions. This goes on the opposite direction that we have heard where LLMs tend to favour freshness.
This could suggest that many LLM systems are not able to continuously access or retrieve fresh Reddit content at scale anymore. Instead, they may be relying on older indexed snapshots or previously ingested datasets.
Curious if anyone else working on LLM visibility has observed something similar?