Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRHush site change #1557

Merged
merged 1 commit into from
Jan 2, 2024
Merged

VRHush site change #1557

merged 1 commit into from
Jan 2, 2024

Conversation

toshski
Copy link
Contributor

@toshski toshski commented Jan 1, 2024

Replacement VRHush scraper for new site.

Note: Scene URLs on the site have all been changed, the Scene Id is also no longer in the URL. This means the first time running the scraper, it will think all scenes need scraping as the URL will not existing in an existing XBVR database. The Scene Ids still match so the existing scene will be updated with the new URL. The old URL are redirected by the site to the new location.

The new URLs use the scene title. VRHush have a lot of duplicate titles (due to POV/Voyeur/Anal variants for a lot of scenes), all are listed on their site, but the links only take you to one scene. Therefore, if you do a full scan with an empty database, you will get less scenes than before.

A few fields were missing from the Web pages, however, the pages have a script tag with a dump of Json data that is very complete, so most fields are now populated from the Json data. This applies for scraping from the VRHush actor profile pages.

Paging the scene list seems to be done in JS, rather than a URL link. So, I loop through bumping the page number manually and stop when the Next Page button is disabled.

There was an oversight with the existing VRHush scraper not linking with Stashdb. I have fixed this, but it will be of little value for old scenes, since the URL are completely different now.

Trailers play, but do seem to stutter a lot in Hereshpere for me, but do play.

Also, fixed a bug where using the scrape_json method for trailer did not add the ContentBaseUrl when provided.
I have also dropped the number of concurrent Actor scrapes from 20 to 10, I have seen occasional timeouts in testing, this effects all actor site scrapes.

@crwxaj crwxaj merged commit 3295daa into xbapps:master Jan 2, 2024
1 check passed
@toshski toshski deleted the VRHush_site_changes branch January 3, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants