u/Greedy_Artist_6005

I’ve been working on a production-grade scraping framework targeting three major US retailers and wanted to share it since each site required a completely different evasion strategy.

GitHub:  
https://github.com/Srmaraghu/walmart-kroger-petco-scraper

### Stack
- TypeScript
- Patchright (Playwright fork)
- impit
- Cheerio

### What makes each scraper interesting

#### Walmart
Instead of relying on brittle CSS selectors, it extracts the `__NEXT_DATA__` hydration JSON embedded in the page (Next.js data layer).

Also paired with:
- simulated mouse movement
- auto-scroll behavior
- passive bot-detection evasion

#### Kroger
Probably the trickiest target.

They aggressively serve `"Robot or human?"` challenge pages.

Current solution uses:
- a deferred retry queue
- blocked URLs parked temporarily
- retries with fresh browser contexts after the main crawl loop finishes

Also uses `page.evaluate()` to trigger internal API calls directly from the browser context so review requests appear native.

#### Petco
Focused on large-scale store directory crawling across thousands of pages.

Uses `impit` (browser-emulating HTTP client) instead of full Playwright for traversal, which made it significantly faster.

Also pulls:
- Google Maps ratings
- review counts

by extracting data from Maps direction links and parsing the returned HTML.

Would love feedback, especially on the Kroger deferred queue approach — curious if others have solved similar retry/challenge patterns differently.

Stars appreciated if the repo is useful.

Built a multi-target retail scraper in TypeScript — Walmart, Kroger, and Petco with anti-bot evasion (Patchright + impit)