▲ 3 r/Playwright
About scrapping only URLs with .pdf extensions
I have taken use of playwright for web crawling and then scrapping all the urls present from all the webpages and storing them in a excell
After word another script for pick only those urls which are downloadable products PDFs (that only I want ) from that huge excell
But not getting accurate results
The total scraped urls are more than 20k but PDFs are about 100, which is wrong
Can anyone suggest some ideas what to do here.
u/Frequent_Stretch4304 — 3 days ago