▲ 0 r/learnpython
I built a scraper that goes through 50 pages and collects data. It works fine on a good connection but I'm wondering what the best practice is for handling pages that fail to load mid-scrape.
Right now if one page throws an error the whole script crashes and I lose everything collected so far. I've seen people mention try/except and retry logic but not sure what the cleanest approach is.
Is it better to save progress after each page so you can resume, or just wrap everything in a try/except and skip failed pages? And is there a standard number of retries people use before giving up on a page?
Currently using requests and BeautifulSoup if that's relevant.
u/Hot_Cattle8375 — 20 days ago