u/BotConductStandard — reddlx

▲ 9 r/botwatch+2 crossposts

I run a behavioral observatory that measures how bots and AI agents behave on the open web. Last week the system flagged an actor with the highest sustained behavioral score I had ever seen. Memory score 70 out of 100. Susceptibility 53. The actor had been visiting my site for 17 consecutive days from 24 different cloud providers and ISPs across four continents.

Every Web Application Firewall I have ever worked with would have classified it as a legitimate user.

The progression:

- Days 1-4: Home page only. Once or twice per day. Looked like a researcher.

- Day 2: A single probe to /.git/HEAD buried among innocent requests.

- Day 5: Started reading blog posts and technical reports systematically.

- Day 6: Probed /RECORDINGS/ORIG/ — a path that has never existed on the site.

- Days 7-8: Read more content + probed /wp-json/wp/v2/posts on a non-WordPress site.

- Day 9: Re-tested /.git/HEAD to check if anything changed.

- Day 10: Read the post describing our evaluation methodology. Studying the defender.

- Day 11: Found the Training Center. Mapped every operational component.

- Days 12-13: Went silent. Planning.

- Day 14: Probed /sdk/bcs.py and /systembc/ — a known RAT family directory.

- Day 15: Probed /.env — the credential file.

- Day 16: 190 requests in one day. Escalated to attacking /api/report-hit and /.env with variations.

- Day 17+: Still active.

24 cloud providers used: Google Cloud, AWS, Azure, Hetzner, Contabo, Leaseweb, Cellcom/TripleC/HOTmobile (Israel), Biznet/Telkom (Indonesia), BT/BSkyB/YouFibre (UK), BITERIKA (Russia), OMEGATECH (Seychelles), and more.

No single IP range had enough activity to trigger any reputation-based blocking. The TLS fingerprint stayed identical across all 24 providers — that is how we identified it as one operator.

Why standard defenses missed it: rate limiting saw 5-10 requests/day for two weeks. Bot management saw a consistent Python Requests UA. Reputation filtering saw clean IPs. SIEM would have caught the /.git and /.env probes individually but not correlated them with 14 days of innocent reading from rotating IPs.

The actor knew this. The low volume was deliberate.

In our broader data: 79% of bot traffic to the site was reconnaissance — not the high-volume scraping everyone talks about. Only 0.9% was mass scraping.

Full technical writeup with methodology: https://botconduct.org/blog/17-day-reconnaissance/

The actor is still active on the site. We have not blocked it — telemetry on sustained reconnaissance is more valuable than mitigation when there is nothing to protect

reddit.com

u/BotConductStandard — 24 days ago

▲ 0 r/PinoyProgrammer

[ Removed by Reddit on account of violating the content policy. ]

reddit.com

u/BotConductStandard — 26 days ago

▲ 7 r/botwatch+2 crossposts

We run an independent observatory that measures how bots and AI agents behave on the open web. Last week we caught something that's worth writing about.

## The pattern

It started with a TLS fingerprint that kept showing up across different IP addresses. Same handshake, same parameters, same JA4 hash: `t13d311100_e8f1e7e78f70_d41ae481755e`.

That fingerprint is interesting on its own. It tells you the client uses TLS 1.3, with 31 cipher suites and 11 extensions. But the part that matters is the ALPN field. It's empty.

Real browsers always advertise ALPN. Chrome sends `h2`. Firefox sends `h2`. Safari sends `h2`. They negotiate HTTP/2 because every modern browser uses HTTP/2. A client that connects with TLS 1.3 in 2026 and announces no ALPN is not a browser. It's an HTTP library — Go's net/http, Python's requests with custom TLS, something in that family.

So we already knew: not a browser. Whatever was visiting us was pretending to be one.

## What it was pretending

The user agents told the rest of the story. The same JA4 fingerprint cycled through 13 different browser identities: Chrome 135 on Windows, Chrome 135 with Edge, Chrome 134 on Mac, Firefox 137, Safari 18.3, Safari 18.2, Chrome with Adguard, Chrome 131, Chrome 130, Chrome 116, ChromeOS, and a few others.

Thirteen browsers. One TLS handshake. The math doesn't work. Real users don't have thirteen browsers. Real browsers don't share TLS fingerprints. Someone built a list of common user agents and rotated through them on every request, while the underlying software stayed the same. That's deliberate. That's evasion.

## Where it was coming from

We pulled the IPs and ran them through ARIN. The allocation 47.74.0.0–47.87.255.255 is assigned to Alibaba Cloud LLC (AL-3). All 107 connections from this fingerprint to our site originated from rented infrastructure inside that allocation.

So we knew where the rental came from. We didn't know who rented it. Alibaba Cloud doesn't publish customer information. The trail stops at the cloud provider's perimeter.

## The detail that made it worse

While we were looking at the Alibaba traffic, the same JA4 fingerprint appeared once on a different IP: `3.91.x.x`. That block belongs to Amazon Web Services, us-east-1.

One hit. Same fingerprint. Different cloud.

That changes the picture. It's not a bot operating from Alibaba Cloud. It's a bot whose operator runs the same software across multiple cloud providers. Multi-cloud isn't a coincidence. It's how you build infrastructure that's hard to take down and hard to attribute.

## What it was doing

The behavior on our site was consistent with content harvesting. The bot consistently accessed paths that no organic visitor would reach. It never requested robots.txt. Not once across 107 connections. It never identified itself as a bot in any user agent. It hardcoded a referer header pointing to our home page on every request, regardless of where it actually came from.

There's also a small technical tell. One of the first paths it visited was a malformed URL: it had tried to follow a link to a Twitter profile from our home page, and it didn't resolve the URL escapes correctly. Browsers don't do that. HTML parsers built into scraping libraries do.

## What we can prove and what we can't

We can prove the TLS fingerprint. We can prove the IP ranges. We can prove the user agent rotation. We can prove the never-read-robots-txt. We can prove the multi-cloud appearance of the same software. All of this is independently verifiable: ARIN for IP attribution, the JA4 spec for fingerprint interpretation, our cryptographically signed observation chain for the request data.

We can't prove who runs it. We can't prove what they do with the harvested content. We can't prove which other sites they're hitting. We can guess based on behavior — content harvesting at this scale, with this level of evasion, is consistent with AI training data collection or competitive scraping operations. But guessing isn't proof.

## The part that should bother you

Both Alibaba Cloud and AWS prohibit exactly this kind of activity in their Acceptable Use Policies. AWS explicitly forbids "scraping" and "unauthorized data collection." Alibaba Cloud's terms forbid using their infrastructure for "activities that violate the legitimate rights and interests of others." Both providers wrote those rules. Neither enforces them in any way that would prevent what we're describing.

The infrastructure is rented. The policies are written. The enforcement is absent.

If you run a website, this matters to you. The bot we measured is one operator using one software stack. If our small observatory caught it in a few days of operation, the actual scale of this activity across the web is much larger. The same anonymous infrastructure is available to anyone with a credit card. The same lack of enforcement applies to everyone using it.

You probably won't see this kind of traffic in your standard analytics. Your CDN might rate-limit it, but it won't tell you what it was. Your WAF might block some of it, but it won't attribute it. The systems we built to defend the web were built when bots had names and IP reputation meant something. Anonymous operators rotating across cloud providers don't fit that model.

## What we're doing about it

We're publishing what we measure. The data behind this post is part of a larger registry of observed bot behavior, classified by what bots actually do on the open web rather than what they claim. We can't identify the operators. We can identify the patterns. We think that's worth making public.

**Think this bot might be hitting your site?** We'll run a free vulnerability report for you. Send us your domain to **hello@botconduct.org** with subject "Vulnerability Report" and we'll tell you what we see.

The full methodology, registry, and cryptographically signed evidence chain: [botconduct.org](https://botconduct.org)

We're going to keep publishing cases like this. There will be more.

— BotConduct

reddit.com

u/BotConductStandard — 27 days ago

▲ 2 r/Agent_AI+1 crossposts

Spent the past weeks thinking about agent evaluation and wrote this up. The core claim: the products shipping in this space mostly test static compliance — define N rules, check each, aggregate a score.

That doesn't tell you what happens when conditions around the agent change. And it doesn't tell you whether the certification actually means anything across different infrastructures.

Two specific gaps:

**Behavior under change.** Static evaluation measures observable state at a single point in time. It says nothing about how the agent adapts when directives evolve, signals contradict, or adversarial inputs arrive. These are the situations that cause real production incidents.
**Cross-platform recognition.** An agent certified by a Cloudflare-tied tool isn't recognized by DataDome sites, or vice versa. Each vendor has its own walled-garden reputation system. Enterprise buyers want one evaluation that works regardless of what CDN the target uses.

The alternative I've been tracking: scenarios where conditions evolve during evaluation, reports that show decision trajectory instead of checkmarks, and certifications that are infrastructure-neutral — the result stands whether the target is behind any specific bot-management vendor or nothing at all.

Curious what others here use. Are you running dynamic scenarios against your own agents? How do you think about portability of whatever certification you have?

reddit.com

u/BotConductStandard — 27 days ago