-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HQPorner] Add new extractor: video, playlist and search pages #32245
base: master
Are you sure you want to change the base?
Conversation
Since content is generally subject to copyright, that's not the deciding factor. Look at other sites supported by yt-dl and you will find copyrighted material that may be hosted with, or (shockingly) without, the copyright holder's permission: not just porn, but user-generated content sites like YouTube. Also, #7201 applies to the site as it was in 2015, though there's no indication of whether the site has changed. The project's policy has two parts:
The linked guidance is this:
I would say that "the whole front page of the service is filled with videos they are not allowed to distribute" could cover almost every porn site that yt-dl supports. I looked at a large sample of yt-dl[p] porn site extractors and most failed the test: eg, daft.sex. The porn ecosystem is entirely different from, say, broadcasting and movie streaming sites. If a movie is available on BBC iPlayer, we can take it that the BBC is distributing it with permission. Sites like YouTube and Vimeo do distribute actual user-generated content alongside material for which the submitter may not have the rights. In fact, YT conspires with its users and the music industry to promote ad-supported content submitted by users who do not own the content. A typical porn site provides no explicit copyright attribution. The user can't be expected to know which video might have been licensed by the copyright owner, even in the presence of a logo or watermark. Does the site have some sort of ad-revenue-sharing deal with content owners? Is the content being supplied legitimately as a promotion? No-one can tell for sure, and the waters are further muddied because the same company may operate both a paid content site and ad-supported "user-generated" sites that include old content from the paid site. My interpretation of the policy is rather that yt-dl is like a web browser and it cannot be responsible for identifying whether any particular content is legitimate. If a user is concerned about some material, don't download it. The site must allow copyright owners to remove unauthorised content. If the site claims to follow DMCA, that should be enough, as it is for YouTube. Therefore, I don't entirely support the statement about DMCA in the guidance quoted above. Also, the project requirement "example URLs should not violate any copyrights" seems to be very difficult to interpret in this context. One might think that a URL is a reference that cannot be restricted by copyright: otherwise no-one could write Regarding HQPorner, there are particular issues:
Because of point 1 above, as well as the ecosystem context, it's difficult to tell whether any video is posted with valid permission. Maybe (point 2) the material is actually being provided by the owners as a sort of ad-supported front? The example URL in yt-dlp/yt-dlp#7147 showed a gigantic banner ad for the apparent owner of the video. In this PR I selected a more anonymous video for the main test. |
Boilerplate: own code, new extractor
## Please follow the guide belowx
into all the boxes [ ] relevant to your pull request (like that [x])Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This PR adds an extractor module for hqporner.com, as suggested in yt-dlp/yt-dlp#7116. The module provides three extractors:
HQPornerIE
for video pagesHQPornerListIE
for playlist pages based on category or performerHQPornerSearchIE
for search pages.For video pages, the following metadata is extracted:
title
from video caption or page titleage_limit
fixed at 18, though pages include RTA tagupload_date
from approximate age in captiondescription
from "featuring ..." in captionduration
from caption or page descriptioncategories
from caption or page meta keywordstags
from page descriptionthumbnail
from HTML5 video.Playlist entries get
title
,duration
,thumbnail
.