You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use the polite package for, well, polite, web-scraping. On problem I've run into is that it uses the robotstxt values for the crawl-delays, but in this specific example, it ends up with a crawl delay of 2000 (using the first line with *), which doesn't actually match the robots.txt values.
I think the problem is that one of the User-agents defined in the robots.txt file has a capital "A". Is this something that should definitely be fixed by the site, or would it be possible to make the argument matching case-insensitive?
I'm trying to use the polite package for, well, polite, web-scraping. On problem I've run into is that it uses the robotstxt values for the crawl-delays, but in this specific example, it ends up with a crawl delay of 2000 (using the first line with *), which doesn't actually match the robots.txt values.
I think the problem is that one of the User-agents defined in the robots.txt file has a capital "A". Is this something that should definitely be fixed by the site, or would it be possible to make the argument matching case-insensitive?
https://r-bloggers.com/robots.txt
Showing only part...
Thanks!
The text was updated successfully, but these errors were encountered: