r/WebScrapingLab

Do you prefer XPath, CSS selectors, or something else?

I’ve been seeing people argue XPath vs CSS selectors like it’s some huge loyalty thing, so I’m curious how people here actually use them once a scraper gets past the quick test stage.

I used to default to CSS selectors because they felt easier to read while I was building. They still feel cleaner to me when the page has decent class names or a simple structure. The problem is that a lot of sites do not make it that easy. Sometimes the useful data is sitting near a label, inside a weird table, or buried in markup that clearly wasn’t written with scraping in mind.

That’s where XPath started making more sense to me. Not as my default for everything, but as the thing I reach for when CSS starts feeling like I’m forcing it. At this point I don’t really care which one is supposed to be better. I care about what I can come back to later without hating myself when the site changes.

reddit.com

u/BlueLagoon226 — 4 days ago

▲ 5 r/WebScrapingLab+1 crossposts

What is your opinion on AI agents for web scraping?

AI agents can help get the ball rolling, but I don’t think they work as the final approach.

I’ve seen people treat them like they can just hand over a finished scraper on the first go. The first draft might look decent, but once you test it you still have to clean up the logic and figure out what it misunderstood.

Sometimes the back and forth takes just as long as writing it yourself. At the end of the day its still just a tool to help with some gaps but it shouldn't be blindly trusted.

reddit.com

u/BlueLagoon226 — 8 days ago

▲ 4 r/WebScrapingLab+1 crossposts

What tools are currently in your web scraping stack?

I’ve been seeing a lot more Playwright lately, but still plenty of people sticking with Requests/BS4 or Scrapy when the site doesn’t need a browser.

I’m mostly using Python with Requests and BS4 for simple stuff, then Playwright when a site forces it.

Always interesting to see what people actually use once the scraper has to run more than once.

reddit.com

u/BlueLagoon226 — 9 days ago

▲ 5 r/WebScrapingLab+1 crossposts

What was the first web scraping problem that made you realize scraping is harder than it looks?

For me, it was when a scraper worked perfectly on one page, then failed on the next page of the same site because the HTML was slightly different.

At first I thought scraping was just “fetch page, select elements, save data.” Then you run into missing fields, weird pagination, lazy loading, blocked requests, random layout changes, duplicate data, and suddenly your simple script needs error handling, retries, logging, and a way to know when it silently breaks.

Curious what moment made it click for you.

reddit.com

u/TheReverent — 10 days ago