Heads up: the "Block AI Scrapers" toggle is default-on for new free-plan zones, and it silently overrides robots.txt
Heads up for anyone running a Cloudflare zone created since mid-2024.
The "Block AI Scrapers and Crawlers" toggle that shipped in July 2024 was made default-on for new free-plan zones. Most people don't realize it's there. It blocks at the edge, runs before your origin sees the request, and bypasses robots.txt entirely. Your robots.txt can be perfectly permissive — the bot still gets blocked at the edge (usually 403, sometimes a managed challenge response) and you don't show up in ChatGPT, Claude, or Perplexity.
Bouncer metaphor: Cloudflare is the bouncer at the door, checking the user-agent on the ID. Your robots.txt is a sign on the wall inside the building. The bouncer never reads it.
What makes the toggle especially destructive:
- It blocks live-retrieval bots (
ChatGPT-User,Claude-User,Perplexity-User), not just training crawlers. Live-retrieval is the bot that fetches your page right now when a human asks ChatGPT a question. Blocking it makes you invisible in the answer. - Standard "AI crawler checker" tools only parse robots.txt, so they tell you everything is fine.
- It's silent. The dashboard doesn't surface "by the way, you're blocked from half the AI surfaces" anywhere obvious.
Test yours in 30 seconds:
# Browser UA — should pass
curl -A "Mozilla/5.0 (Chrome/120)" -I https://yoursite.com
# AI bot UA — passes or 403?
curl -A "Mozilla/5.0 (compatible; ChatGPT-User/1.0; +https://openai.com/bot)" -I https://yoursite.com
If you see server: cloudflare + 403 on the bot UA, that's the toggle.
Where to find it: Security → Bots → AI Audit / "Block AI Crawlers" (and check Super Bot Fight Mode while you're there).
Three ways to fix, in increasing granularity:
- Just turn the global toggle off → discoverable in all AI surfaces.
- Custom WAF rule allowing live-retrieval bots (
ChatGPT-User,Claude-User,Perplexity-User) and blocking training crawlers (GPTBot,ClaudeBot) → visible in live answers, opted out of training. Most sites should want this. - Use the Verified Bots allowlist if you trust Cloudflare's identity verification for individual bot categories.
Wrote up the full thing if you want the broader picture (the three layers where blocks happen — robots.txt, this CDN layer, and the SPA shell trap that catches a separate class of sites): https://lintpage.com/blog/ai-crawler-accessibility-guide
The Cloudflare layer is the biggest source of silent AI invisibility we see in practice right now. Happy to debug specific configs in the comments if anyone wants a second pair of eyes.