Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vlive] fix extractor for revamped website #101

Merged
merged 10 commits into from
Nov 5, 2020
Merged

[vlive] fix extractor for revamped website #101

merged 10 commits into from
Nov 5, 2020

Conversation

exwm
Copy link
Contributor

@exwm exwm commented Nov 1, 2020

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Fixes vlive extractor for their revamped website. This should address #60 to some degree but is tested only with VODs and not with other video types such as live video. Thanks to @robindz in #60 (comment) for investigation into the fix.

The basic extractor now supports both video urls (eg https://www.vlive.tv/video/1326) and video post urls that point to the same video (eg https://vlive.tv/post/1-18244258 which points to the same example video 1326).

Adds support for new channel url format. The old format followed the example https://channels.vlive.tv/FCD4B while the new format follows the example https://www.vlive.tv/channel/FCD4B. The old format redirects to the new format. Additionally https://www.vlive.tv/channel**s**/FCD4B will also redirect to https://www.vlive.tv/channel/FCD4B. Also includes some regex improvements here, allowing www and m subdomains in channel urls.

Playlists seem to have been removed from my cursory investigation and I did not touch the playlist extractor code for now.

Edit 1: The live video extractor should now be fixed using the https://www.vlive.tv/globalv-web/vam-web/old/v3/live/<videoSeq>/playInfo endpoint. Thanks to @robindz and @SeonjaeHyeon for finding this endpoint.

Edit 2: Things missing so far:

  • extraction of playlists, which are still present in some form but may no longer be detectable from the url alone

Edit 3: Fixed detection of vlive+ paywalled videos. Login for vlive+ videos is likely missing.

Edit 4: Login is working, but downloading premium content is still broken. There may be a different endpoint or different headers that need to be passed to the current vod key endpoint https://www.vlive.tv/globalv-web/vam-web/video/v1.0/vod/%s/inkey. Playlists are also still not working. Work on these items could be continued in a new pull request.

@exwm exwm changed the title Vlive fix [vlive] fix extractor for revamped website Nov 1, 2020
Copy link
Contributor

@SeonjaeHyeon SeonjaeHyeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the LIVE video ended, video_status is changed to ENDED. So I think it would be better add this.

@exwm
Copy link
Contributor Author

exwm commented Nov 2, 2020

It may be better also to use the endpoints https://www.vlive.tv/globalv-web/vam-web/old/v2/vod/<videoSeq> and https://www.vlive.tv/globalv-web/vam-web/old/v2/live/<videoseq> to fetch the video params and fall back to the params fetched from the page html if necessary. Having two endpoints for the video params is a little annoying and I wanted to test if https://www.vlive.tv/globalv-web/vam-web/old/v2/vod/<videoSeq> works for live videos but didn't get the chance yet. Also this may only work for video urls and not post urls since we wouldn't know the video_id until after reading params from the page.

@robindz
Copy link

robindz commented Nov 2, 2020

https://www.vlive.tv/globalv-web/vam-web/old/v2/vod/<videoSeq> doesn't work for live videos, you'll get the following response:

{
    "code": 9115,
    "message": "유효하지 않은 VideoSeq. <videoSeq>"
}

@SeonjaeHyeon
Copy link
Contributor

SeonjaeHyeon commented Nov 2, 2020

For LIVE videos, https://www.vlive.tv/globalv-web/vam-web/old/v3/live/<videoSeq>/playInfo should be used. And it also requires appId for parameter.

@robindz
Copy link

robindz commented Nov 2, 2020

@SeonjaeHyeon https://www.vlive.tv/globalv-web/vam-web/old/v3/live/<videoSeq>/playInfo shouldn't require appId, just make sure these three headers are set: Host, User-Agent and Referer

@SeonjaeHyeon
Copy link
Contributor

@robindz I see. Maybe I should have tested it more.

And there is one thing I'm wondering, as @exwm said, it seems there are no more playlists but I'm not sure about this.
Could anyone figure it out?

@robindz
Copy link

robindz commented Nov 2, 2020

As far as I can tell, playlists still exist, https://www.vlive.tv/video/26669 is a video that is part of a playlist.

@exwm
Copy link
Contributor Author

exwm commented Nov 2, 2020

@robindz curious how you found the playInfo endpoint. I didn't see any requests to that endpoint from my investigation. It contains some useful info which may help make the live extractor work again.

If playlists are only present on video and post urls now, then it won't be possible to use the playlist extractor based only on the url. A cli flag or something may be necessary then to download the whole playlist given a video link.

@robindz
Copy link

robindz commented Nov 2, 2020

@exwm I just looked at devtools during a live video. Searched for 'm3u8' and eventually found that endpoint.

@etec-masterofsynapse
Copy link

I tested your changes across 15 vlive channels with my wrapper-script. Had no errors during the runs apart from Vlive+ videos.
It seems like there is a handler missing that shows the usual Vlive+ warning about authentication.
Works well, good job 👍
Subtitles also download correctly.

@exwm
Copy link
Contributor Author

exwm commented Nov 2, 2020

@etec-masterofsynapse I've pushed a fix for the vlive+ video detection. I haven't looked at extracting and downloading vlive+ videos, though I'm not sure if that was possible before. I don't have a vlive+ account in any case and am not interested in bypassing the paywall.

@etec-masterofsynapse
Copy link

I've pushed a fix for the vlive+ video detection. I haven't looked at extracting and downloading vlive+ videos, though I'm not sure if that was possible before. I don't have a vlive+ account in any case and am not interested in bypassing the paywall.

There are some inconsistencies regarding CH+ and Vlive+ so there may have been an update that allowed both.
And we wouldnt bypass any paywall since all YTDL was doing was to capture the login cookies from Vlive and use these to download the needed video details.

I tried to do that with the URL I have access to, only to get an Unable to download JSON metadata error because of a HTTP 500 error.
Interestingly the vlive extractor doesnt get used in this case, the error comes from the common extractor, which of course wouldnt understand the URL supplied.

@exwm
Copy link
Contributor Author

exwm commented Nov 2, 2020

And we wouldnt bypass any paywall since all YTDL was doing was to capture the login cookies from Vlive and use these to download the needed video details.

Fair enough.

I'm not too clear on the details of vlive+ or the difference between it and CH+. @etec-masterofsynapse Do you have any examples of CH+ videos that don't work without login on the revamped site?

I don't have a vlive+ account or any CH+ access so would be hard to test.

@etec-masterofsynapse
Copy link

etec-masterofsynapse commented Nov 2, 2020

Fair enough.

I'm not too clear on the details of vlive+ or the difference between it and CH+. Do you have any examples of CH+ videos that don't work without login on the revamped site?

I don't have a vlive+ account or any CH+ access so would be hard to test.

Here is one that definitely needs a login: https://www.vlive.tv/post/0-18401413 // https://www.vlive.tv/video/202607
Its a Vlive+ one specifically, because its an one-time payment for infinite access.
CH+ is a paid subscription like the YT memberships.

@exwm
Copy link
Contributor Author

exwm commented Nov 3, 2020

@etec-masterofsynapse I'm able to log in through youtube-dlc, so I think that part of the code is fine. When I try to download https://www.vlive.tv/video/202607 as you linked, I get an error object with a message that says You can use this after buying.....

Any chance you could check what the params dict returned on this line (https://github.com/exwm/yt-dlc/blob/vlive-fix/youtube_dlc/extractor/vlive.py#L120) looks like when logged in with access? Does it contain a path like ["postDetail"]["post"]["officialVideo"]?

@etec-masterofsynapse
Copy link

I'm able to log in through youtube-dlc, so I think that part of the code is fine. When I try to download https://www.vlive.tv/video/202607 as you linked, I get an error object with a message that says You can use this after buying.....

Any chance you could check what the params dict returned on this line (https://github.com/exwm/yt-dlc/blob/vlive-fix/youtube_dlc/extractor/vlive.py#L120) looks like when logged in with access? Does it contain a path like ["postDetail"]["post"]["officialVideo"]?

If you could help me how I would go about checking the dict you mentioned?
I can write code with C# but I am not too firm with Python yet.

@exwm
Copy link
Contributor Author

exwm commented Nov 3, 2020

@etec-masterofsynapse One easy way would be to just add a print call on line 121. I've done that in this gist (https://gist.github.com/exwm/ff8a42f6e860dd4496d15c3496971b3a#file-vlive-py-L121). You could copy the raw contents and replace the ./youtube_dlc/extractor/vlive.py file and run youtube-dlc.

Be careful not to post the whole contents of the dict, it may contain sensitive information.

@exwm exwm marked this pull request as draft November 3, 2020 02:03
@SeonjaeHyeon
Copy link
Contributor

Any chance you could check what the params dict returned on this line (https://github.com/exwm/yt-dlc/blob/vlive-fix/youtube_dlc/extractor/vlive.py#L120) looks like when logged in with access? Does it contain a path like ["postDetail"]["post"]["officialVideo"]?

I tested some Vlive+ VODs and all of them contain officialVideo path. That line works for Vlive+ too.
And the problem is on this line (https://github.com/exwm/yt-dlc/blob/130599af9476284e7f0b3be4f68a0ff8346fb6ea/youtube_dlc/extractor/vlive.py#L196). This line throws HTTP 500 error.
I have no idea why the error occurs, maybe something like cookies or params is wrong.

@exwm
Copy link
Contributor Author

exwm commented Nov 3, 2020

@SeonjaeHyeon Can you check the network requests made on the website? Does it still make a request to the normal vod key endpoint https://www.vlive.tv/globalv-web/vam-web/video/v1.0/vod/%s/inkey? Or perhaps there is a difference in headers you can see.

@exwm
Copy link
Contributor Author

exwm commented Nov 4, 2020

I'm thinking this pull request should be ready for review and merge. Fixing downloading of premium content can be done in a new pull request. Should I squash the commits into one?

@exwm exwm marked this pull request as ready for review November 4, 2020 02:36
@SeonjaeHyeon
Copy link
Contributor

I found out what was missing. Vlive+ VODs require platformType parameter. Then the endpoint will be https://www.vlive.tv/globalv-web/vam-web/video/v1.0/vod/<videoSeq>/inkey?platformType=PC.

And.. actually, even though we download Vlive+ VODs, we cannot play them due to Widevine DRM. I think decrypting DRM is not this plugin's purpose. Besides, download for replays uses Naver TV's endpoint, it seems that the endpoint doesn't work for DRM contents.

Maybe it would be better that the plugin just raises ExtractorError when HTTP 500 error occurred.

@exwm
Copy link
Contributor Author

exwm commented Nov 4, 2020

It definitely feels like a greyer area. I don't think HTTP 500 errors are the right thing to check though. They may occur for other reasons.

I don't think I will be working any further on the handling of premium content. I think a separate pull request should be made if someone wants to pursue that.

@blackjack4494
Copy link
Owner

downloading or circumventing/breaking DRM is not allowed!
Please add some check if there is drm and drop it so it won't get added in formats.

@exwm
Copy link
Contributor Author

exwm commented Nov 4, 2020

Downloading premium content was already previously present in the vlive extractor. Specifically, I think CH+ content from V LIVE could be downloaded, though I don't know if that content was DRM-protected. That functionality is now broken, though it may be as trivial as adding a platformType=PC query parameter to the vod key endpoint to fix. I haven't done that though and don't plan to in this pull request.

I don't think circumventing video DRM such as Widevine DRM is even in the scope of the capabilities of youtube-dl/youtube-dlc.

@etec-masterofsynapse
Copy link

Downloading premium content was already previously present in the vlive extractor. Specifically, I think CH+ content from V LIVE could be downloaded, though I don't know if that content was DRM-protected. That functionality is now broken, though it may be as trivial as adding a platformType=PC query parameter to the vod key endpoint. I haven't done that though and don't plan to in this pull request.

I don't think circumventing video DRM such as Widevine DRM is even in the scope of the capabilities of youtube-dl/youtube-dlc.

From what I read CH+ had a different or no DRM at all, which disappeared by downloading the file. Vlive+ however is equipped with the newer Widevine DRM. Depending on Level 2 or 3 it could be automatically removed (Level 3 is vulnerable) but since it isnt wanted we wont do it. :)

@exwm
Copy link
Contributor Author

exwm commented Nov 4, 2020

I think downloading the content is fine as long as there isn't explicit DRM circumvention. Generally, I maintain that it is not the tool but the hand that wields it. Still, I think that work would be better done in a separate pull request by someone else with access to premium content for testing, if it is to be done at all. The discussion here may help guide that potential effort.

@blackjack4494
Copy link
Owner

Generally, I maintain that it is not the tool but the hand that wields it.

Try to tell that in court haha. I do think the same since this tool (yt-dlc) isn't purely or solely focused on circumventing at all.

@blackjack4494 blackjack4494 merged commit 206de9b into blackjack4494:master Nov 5, 2020
@kyuyeunk
Copy link
Contributor

Playlists are also still not working. Work on these items could be continued in a new pull request.

I have made a pull request addressing the playlist feature in #223 and #224 .
I was able to extract playlist info using webpage data extracted from _download_webpage

RobinD42 pushed a commit to RobinD42/yt-dlc that referenced this pull request Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants