u/Frequent_Stretch4304

About scrapping only URLs with .pdf extensions

I have taken use of playwright for web crawling and then scrapping all the urls present from all the webpages and storing them in a excell

After word another script for pick only those urls which are downloadable products PDFs (that only I want ) from that huge excell

But not getting accurate results

The total scraped urls are more than 20k but PDFs are about 100, which is wrong

Can anyone suggest some ideas what to do here.

reddit.com
u/Frequent_Stretch4304 — 3 days ago