Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Replacement VRHush scraper for new site.
Note: Scene URLs on the site have all been changed, the Scene Id is also no longer in the URL. This means the first time running the scraper, it will think all scenes need scraping as the URL will not existing in an existing XBVR database. The Scene Ids still match so the existing scene will be updated with the new URL. The old URL are redirected by the site to the new location.
The new URLs use the scene title. VRHush have a lot of duplicate titles (due to POV/Voyeur/Anal variants for a lot of scenes), all are listed on their site, but the links only take you to one scene. Therefore, if you do a full scan with an empty database, you will get less scenes than before.
A few fields were missing from the Web pages, however, the pages have a script tag with a dump of Json data that is very complete, so most fields are now populated from the Json data. This applies for scraping from the VRHush actor profile pages.
Paging the scene list seems to be done in JS, rather than a URL link. So, I loop through bumping the page number manually and stop when the Next Page button is disabled.
There was an oversight with the existing VRHush scraper not linking with Stashdb. I have fixed this, but it will be of little value for old scenes, since the URL are completely different now.
Trailers play, but do seem to stutter a lot in Hereshpere for me, but do play.
Also, fixed a bug where using the scrape_json method for trailer did not add the ContentBaseUrl when provided.
I have also dropped the number of concurrent Actor scrapes from 20 to 10, I have seen occasional timeouts in testing, this effects all actor site scrapes.