u/Objective-Goal5551

Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt

Heads up for anyone running a Cloudflare zone created since mid-2024.

The "Block AI Scrapers and Crawlers" toggle that shipped in July 2024 was made default-on for new free-plan zones. Most people don't realize it's there. It blocks at the edge, runs before your origin sees the request, and bypasses robots.txt entirely. Your robots.txt can be perfectly permissive — the bot still gets blocked at the edge (usually 403, sometimes a managed challenge response) and you don't show up in ChatGPT, Claude, or Perplexity.

Bouncer metaphor: Cloudflare is the bouncer at the door, checking the user-agent on the ID. Your robots.txt is a sign on the wall inside the building. The bouncer never reads it.

What makes the toggle especially destructive:

  1. It blocks live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User), not just training crawlers. Live-retrieval is the bot that fetches your page right now when a human asks ChatGPT a question. Blocking it makes you invisible in the answer.
  2. Standard "AI crawler checker" tools only parse robots.txt, so they tell you everything is fine.
  3. It's silent. The dashboard doesn't surface "by the way, you're blocked from half the AI surfaces" anywhere obvious.

Test yours in 30 seconds:

# Browser UA — should pass
curl -A "Mozilla/5.0 (Chrome/120)" -I https://yoursite.com

# AI bot UA — passes or 403?
curl -A "Mozilla/5.0 (compatible; ChatGPT-User/1.0; +https://openai.com/bot)" -I https://yoursite.com

If you see server: cloudflare + 403 on the bot UA, that's the toggle.

Where to find it: Security → Bots → AI Audit / "Block AI Crawlers" (and check Super Bot Fight Mode while you're there).

Three ways to fix, in increasing granularity:

  1. Just turn the global toggle off → discoverable in all AI surfaces.
  2. Custom WAF rule allowing live-retrieval bots (ChatGPT-User, Claude-User, Perplexity-User) and blocking training crawlers (GPTBot, ClaudeBot) → visible in live answers, opted out of training. Most sites should want this.
  3. Use the Verified Bots allowlist if you trust Cloudflare's identity verification for individual bot categories.

Wrote up the full thing if you want the broader picture (the three layers where blocks happen — robots.txt, this CDN layer, and the SPA shell trap that catches a separate class of sites): https://lintpage.com/blog/ai-crawler-accessibility-guide

The Cloudflare layer is the biggest source of silent AI invisibility we see in practice right now. Happy to debug specific configs in the comments if anyone wants a second pair of eyes.

reddit.com
u/Objective-Goal5551 — 15 hours ago
▲ 0 r/nextjs

PSA: your Next.js SPA might be invisible to ChatGPT even with perfect robots.txt

EDIT:  correctly points out that 'use client' doesn't disable SSR. The failure mode I described is actually about *client-side data fetching* (useEffect/useQuery/useSWR producing an SSR'd shell with no content), not the directive itself. The fix direction is the same, the explanation was sloppy. Original post below for context.

Quick PSA after debugging this for a project last week.


Your Next.js site can have a perfect robots.txt and still be invisible to ChatGPT, because AI crawlers don't run JavaScript.


If your top-level page component starts with `'use client'`, the crawler sees `<div id="__next"></div>` and stops. The bundle that contains your actual content is never executed.


The `'use client'` directive ends up at page level by accident, a dev adds it on a child component, promotes it to the page wrapper out of habit, then the whole tree is client-only. Templates and copy-pastes from tutorials propagate it. The Next.js docs say to put it at the deepest component that needs interactivity, but in practice most repos end up with it higher than needed.


Fix: keep `'use client'` at the leaf component that actually uses `useState` / `useEffect` / event handlers. Everything above stays as a server component.


How to test your own site in 5 seconds:


```bash
curl -A "Mozilla/5.0 (compatible; ChatGPT-User/1.0; +https://openai.com/bot)" \
  https://yoursite.com \
  | python3 -c "import sys,re;h=sys.stdin.read();h=re.sub(r'<script.*?</script>',' ',h,flags=re.DOTALL);print(len(re.sub(r'<[^>]+>','
  ',h).split()))"
```


If the word count is under 50, your page is invisible to AI bots.


Wrote up the full failure tree (this is one of three layers where blocks actually happen, the other two are at the edge and surprised me more): https://lintpage.com/blog/ai-crawler-accessibility-guide


Curious if anyone has noticed the symptom, low referral traffic from ChatGPT despite ranking on Google. Happy to debug specific cases in this thread.
reddit.com
u/Objective-Goal5551 — 20 hours ago