-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genre_scraper.py only scraping 4000 images max #18
Comments
try it now - i just updated the scraper |
I've tried everything and couldn't fix this. :( |
I'll look into this more over the weekend - really sorry it doesn't work, and thanks for bringing it to my attention - leaving this thread open until i fix it... |
I am having the same problem. If this is not resolvable, would it be possible for you to upload the complete set of images that I assume you still have stored somewhere to a google drive folder? It would be incredibly appreciated. Cheers |
@JOHN-MARSH i'm still looking into it - i think it might be a question of too many threads working at once... i think it will be resolvable. |
Cheers mate. I would try to fix it myself, but web scraping is not something that I have experience with. Keep us updated! |
any updates? |
Not sure if this is a related issue, but I had problem scraping image names that are not utf-8 compatible because it had accent characters. I fixed the problem by adding
|
The scraper works fantastic but is unable to get more than 3000-4000 images from wikiart. I tried adjustung num_pages (up to 4000 pages) but it won't scrape more than 4k pictures.
Maybe it is because on the webpage it is also only showing max 3600 pictures? As can be seen here: https://www.wikiart.org/en/paintings-by-genre/portrait?select=featured#!#filterName:featured,viewType:masonry
Is there any fix to this because I'd like to train the network on more than 4k pictures.
The text was updated successfully, but these errors were encountered: