
Built a multi-target retail scraper in TypeScript — Walmart, Kroger, and Petco with anti-bot evasion (Patchright + impit)
I’ve been working on a production-grade scraping framework targeting three major US retailers and wanted to share it since each site required a completely different evasion strategy.
GitHub:
https://github.com/Srmaraghu/walmart-kroger-petco-scraper
### Stack
- TypeScript
- Patchright (Playwright fork)
- impit
- Cheerio
### What makes each scraper interesting
#### Walmart
Instead of relying on brittle CSS selectors, it extracts the `__NEXT_DATA__` hydration JSON embedded in the page (Next.js data layer).
Also paired with:
- simulated mouse movement
- auto-scroll behavior
- passive bot-detection evasion
#### Kroger
Probably the trickiest target.
They aggressively serve `"Robot or human?"` challenge pages.
Current solution uses:
- a deferred retry queue
- blocked URLs parked temporarily
- retries with fresh browser contexts after the main crawl loop finishes
Also uses `page.evaluate()` to trigger internal API calls directly from the browser context so review requests appear native.
#### Petco
Focused on large-scale store directory crawling across thousands of pages.
Uses `impit` (browser-emulating HTTP client) instead of full Playwright for traversal, which made it significantly faster.
Also pulls:
- Google Maps ratings
- review counts
by extracting data from Maps direction links and parsing the returned HTML.
Would love feedback, especially on the Kroger deferred queue approach — curious if others have solved similar retry/challenge patterns differently.
Stars appreciated if the repo is useful.
u/Greedy_Artist_6005 — 2 days ago