u/Hot_Cattle8375

I built a scraper that goes through 50 pages and collects data. It works fine on a good connection but I'm wondering what the best practice is for handling pages that fail to load mid-scrape.

Right now if one page throws an error the whole script crashes and I lose everything collected so far. I've seen people mention try/except and retry logic but not sure what the cleanest approach is.

Is it better to save progress after each page so you can resume, or just wrap everything in a try/except and skip failed pages? And is there a standard number of retries people use before giving up on a page?

Currently using requests and BeautifulSoup if that's relevant.

reddit.com
u/Hot_Cattle8375 — 20 days ago