I run a behavioral observatory that measures how bots and AI agents behave on the open web. Last week the system flagged an actor with the highest sustained behavioral score I had ever seen. Memory score 70 out of 100. Susceptibility 53. The actor had been visiting my site for 17 consecutive days from 24 different cloud providers and ISPs across four continents.
Every Web Application Firewall I have ever worked with would have classified it as a legitimate user.
The progression:
- Days 1-4: Home page only. Once or twice per day. Looked like a researcher.
- Day 2: A single probe to /.git/HEAD buried among innocent requests.
- Day 5: Started reading blog posts and technical reports systematically.
- Day 6: Probed /RECORDINGS/ORIG/ — a path that has never existed on the site.
- Days 7-8: Read more content + probed /wp-json/wp/v2/posts on a non-WordPress site.
- Day 9: Re-tested /.git/HEAD to check if anything changed.
- Day 10: Read the post describing our evaluation methodology. Studying the defender.
- Day 11: Found the Training Center. Mapped every operational component.
- Days 12-13: Went silent. Planning.
- Day 14: Probed /sdk/bcs.py and /systembc/ — a known RAT family directory.
- Day 15: Probed /.env — the credential file.
- Day 16: 190 requests in one day. Escalated to attacking /api/report-hit and /.env with variations.
- Day 17+: Still active.
24 cloud providers used: Google Cloud, AWS, Azure, Hetzner, Contabo, Leaseweb, Cellcom/TripleC/HOTmobile (Israel), Biznet/Telkom (Indonesia), BT/BSkyB/YouFibre (UK), BITERIKA (Russia), OMEGATECH (Seychelles), and more.
No single IP range had enough activity to trigger any reputation-based blocking. The TLS fingerprint stayed identical across all 24 providers — that is how we identified it as one operator.
Why standard defenses missed it: rate limiting saw 5-10 requests/day for two weeks. Bot management saw a consistent Python Requests UA. Reputation filtering saw clean IPs. SIEM would have caught the /.git and /.env probes individually but not correlated them with 14 days of innocent reading from rotating IPs.
The actor knew this. The low volume was deliberate.
In our broader data: 79% of bot traffic to the site was reconnaissance — not the high-volume scraping everyone talks about. Only 0.9% was mass scraping.
Full technical writeup with methodology: https://botconduct.org/blog/17-day-reconnaissance/
The actor is still active on the site. We have not blocked it — telemetry on sustained reconnaissance is more valuable than mitigation when there is nothing to protect