-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TikTok] Support Sigi-type pages, etc #30479
Conversation
tiktok now shows metadata in a diff format when logged in
Patching hints, depending on your installation type (substitute PR number 30479 and file
|
Hi! Patrick |
adec287
to
99a2b7c
Compare
Add TikTokVM Partial fix for TikTokUser
99a2b7c
to
2f65e20
Compare
when this merge? |
state = self._parse_json( | ||
get_element_by_id('SIGI_STATE', html) | ||
or self._search_regex( | ||
r'''(?s)<script\s[^>]*?\bid\s*=\s*(?P<q>"|'|\b)sigi-persisted-data(?P=q)[^>]*>[^=]*=\s*(?P<json>{.+?})\s*(?:;[^<]+)?</script''', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can @dirkf review this?
|
||
page_props = self._get_SIGI_STATE(user_id, webpage) | ||
user_data = try_get(page_props, lambda x: x['UserModule']['users'], dict) | ||
if user_data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be
if not user_data:
raise ExtractorError(...)
...
If the extractor returns None
, youtube-dl will just silently exit. See yt-dlp/yt-dlp#3776 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally there was some fallback code that would run if not user_data
. Don't we get an ExtractorError anyway if an IE returns a None info_dict? (No, apparently not!)
if result: | ||
result['display_id'] = user_id | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
As observed in yt-dlp/yt-dlp#3776 (comment) the user pages are currently redirecting to a captcha more or less whatever we do wrt cookies and UAs. In a browser with JS disabled and UA set to |
Based on #3624, ytdl-org/youtube-dl#30479 Closes #3551 Authored by dirkf, sulyi, pukkandan
Looks like every issue is about this, when will this get merged? |
Do we think this will see the light of day? :D Was hoping to be able to use it for a little fun project! Thanks |
I think this is also outdated now. There is no |
Please follow the guide below
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
Except: this PR subsumes PR fix tiktok when logged in #30224 whose author also affirmed this.
What is the purpose of your pull request?
Description of your pull request and other information
TT switched (possibly partially) its framework from NextJS to Sigi, and the persisted state JSON sent in the page changed as a result. Instead of a
<script>
element withid
__NEXT_DATA__
, we get one withid
sigi_persisted_state
and JSON with a slightly different structure.This PR deals with both types of page format, based on PR #30224 and this patch which gets more metadata.
Also, extraction could fail with a timeout (Error 60 in Windows, SSLError('The read operation timed out',) in Linux) or connection reset (Error 54 in Windows) due to some weird blocking by whatever fronts TikTok's pages (Akamai, apparenty). In order to download the page for parsing, some cookie has to be sent and a way to get it is to make a previous request to the site. The extractor fetched https://www.tiktok.com/ before doing anything else. In yt-dlp, the code fetches the webpage itself twice, commenting that you get 403 otherwise. This PR copies that tactic but instead of fetching the whole page (
GET
request) it just sends aHEAD
request; if a page is actually returned, rather than an error with aSet-Cookie
header, it doesn't actually have to be downloaded.Probably resolves #28741
Resolves #30251
Resolves #30432
Resolves #30439
Resolves #30445
Resolves #30454
Resolves #30470.
Finally the non-working
TikTokUserIE
has been resurrected for accessing all the videos of a specific user.Resolves #30174.