-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-threading slowdown for YouTube #30641
Comments
So, on a dump T7700 laptop: 2.7: 146s ( Might the difference between yt-dlp and yt-dl be related to the Python version? |
No, I don't think so. |
The code I use is not optimized of course, but the memory usage (for the non released objects) was not of my concern. |
The point of the linked article was the the opportunities for multi-threading speed-up in current Python emvironmments are limited. Although that doesn't directly address why two versions of The main change in There are easy optimisation opportunities in yt-dl's With the optimised code:
It's a slowish unparallel machine so the speed goes down for more threads. Here's the patch:
|
I'm not sure of the meaning here. Single threading results are OK, but I'm interested in the multi-threading usage. I don't think that its a restriction of python that suddenly is causing this. What I need is a way to know that the calculation is not done yet, so the other threads will wait until the first calculation and then use it (from somewhere that it gets stored) for themselves. Is there a |
This is very surprising if true. yt-dlp's youtube extractor is much more complex than youtube-dl's and is known to be slower (at the benefit of more robustness/features). Verify that you are downloading the same format with both. The default format sorting of yt-dlp is different. So this is my primary suspicion
Which change do you mean? You could try and bisect the commit history to find the problematic commit
I doubt the culprit is jsinterp. Last I checked, the time taken for n-sig decryption is insignificant when compared to the total run-time of the extractor. Hence why I never bothered with trying to optimize it |
You can try the simple code at the OP yourself.
Last OK fix was created by @dirkf ~23 Nov 21 as a |
Sorry, didn't notice |
Is the code solving the n parameter challenge? As I recall, the Android client doesn't have to do that but it also doesn't get all the formats that users expect. Also, I guess the Go code is compiled down to machine code whereas we're running Python byte code.
OK, so the unthrottling isn't wasted. And users don't get 404s or throttled bandwidth using the media links some time after unthrottling ?
So far we haven't managed to work out how to share the player cache among embedded yt-dl instances, which TBH is too specialised to merit a lot of effort at the moment. My attempt (make the player cache a module-level var and protect accesses to it with a lock) crashes with 8 threads in Py 2.7 (probably doesn't like running the descrambling function created from the player JS simultaneously in multiple threads) but makes no significant difference for either 2 threads in Py 2.7 or 8 threads in Py 3.9, neither of which crash. Of course, I may have done it wrong.
Definitely true if the download is included, but the patch posted above reduces the number of function calls per extraction from more than 70k (that's with one optimisation from the back-port already) to less than 50k, so I think there's a worthwhile saving. As to whether the constant regex expressions used for parsing should be class or global vars, I have no firm idea. In profiling output, the source files with more than 1 item in the top 20 are jsinterp.py, ssl.py, re.py and sre_parse.py.
My suggestion is that the optimised Also, PyPy 2.7 (pypy-7.3.6) runs the 8x42 test in 44s vs 100s for CPython. |
As, I said, not until they expire, after ~6 hours that give a 403.
As, I said,
So, if I just copy the |
Apply the patch to the installed master version.
From what I've seen users are very disappointed if all the formats from the web player aren't found by yt-dl. Currently, we don't use the Android client: it may soon be added for age-gate bypassing, as better than nothing. yt-dlp lets the user who cares select from a set of players IIRC. In any case the n response overhead is typically unimportant when compared to the download time, even when unthrottled. OP's use case is somewhat specialised. |
Can you tell me how to do it? |
But, as I'll put this in anyway, the patched source file here. And PR #30643. |
Thank you, I tested it and its faster than the current. |
The issue is in fact that jsinterp is slow. But this does not affect yt-dlp much in normal use because of the android fallback Tested with 10 videos on py3.10 with lazy extractors disabled
|
So in the default case, is yt-dlp only descrambling for formats that aren't available from the Android client? And for the n-sig comparison you have 10 videos without a challenge, and 10 with? |
Yes
No, non-fragmented formats have the n-sig challenge for all videos. I manually disabled the descrambling code so that we can identify how much time is being spent on it. |
Motivated by: ytdl-org/youtube-dl#30641 (comment) Authored by: dirkf, pukkandan
I did some optimizations to yt-dlp (yt-dlp/yt-dlp@230d5c8) based on dirkf's above patch and here are the new timings:
(The times are scaled so that the unpatched version matches up with what I posted before) This cuts the time taken by just PS: It appears that (due to GIL?) the time taken by jsinterp is independent of the number of threads, and so will disproportionately affect multithreaded use-case. |
OK, bad news. |
I rather expected that it would be slower. Feel free to run some tests as Pukkandan did and report back. |
|
Also, the throttling is back.. 😠 P.S. A throttled link for test |
https://github.com/yt-dlp/yt-dlp#user-content-youtube: use At any rate, though yt-dlp still comes out faster for me, I can't see obvious optimisations in the JS processing. We have to evaluate, depending on the challenge, 250000 JS expressions and the main time hogs in the processing are just the routines doing that. |
Your test URL works fine in the latest git master, Py2.7, 3.9. |
@dirkf Thanks, I updated the stats..
I reinstalled From the tests I see that the |
The TVembedded client is used in PR #31043 to do what the Android client would have done. Otherwise no. We'd have to invent some way of configuring the client like yt-dlp has and I'm not convinced that there's enough demand for this feature, especially given that yt-dlp covers most of the need. |
So, after the merge, how somebody can invoke that mode? |
Use an age-gated video! Or modify the code so that yt-dl always thinks the video is age-gated ... |
Hmm, the latter might serve as a mode changer. |
Bad news. |
PR #31043 has been merged. |
Is there an easy way to use the TVembedded mode? |
No, unless patching the YT extractor like this is easy: - if (is_agegated(playability_status)
- and int_or_none(self._downloader.params.get('age_limit'), default=18) >= 18):
+ if True and ((is_agegated(playability_status)
+ and int_or_none(self._downloader.params.get('age_limit'), default=18) >= 18)): |
Well, it was easy, but the results did not change a bit (using the OP script). |
[YouTube] [core] Improve platform debug log, based on yt-dlp ytdl-org/youtube-dl@d1c6c5c Except: * 6ed34338285f722d0da312ce0af3a15a077a3e2a [jsinterp] Add short-cut evaluation for common expression * There was no performance improvement when tested with ytdl-org/youtube-dl#30641 * e8de54bce50f6f77a4d7e8e80675f7003d5bf630 [core] Handle `/../` sequences in HTTP URLs * We plan to implement this differently
[YouTube] [core] Improve platform debug log, based on yt-dlp ytdl-org/youtube-dl@d1c6c5c Except: * 6ed34338285f722d0da312ce0af3a15a077a3e2a [jsinterp] Add short-cut evaluation for common expression * There was no performance improvement when tested with ytdl-org/youtube-dl#30641 * e8de54bce50f6f77a4d7e8e80675f7003d5bf630 [core] Handle `/../` sequences in HTTP URLs * We plan to implement this differently
[YouTube] [core] Improve platform debug log, based on yt-dlp ytdl-org/youtube-dl@d1c6c5c Except: * 6ed34338285f722d0da312ce0af3a15a077a3e2a [jsinterp] Add short-cut evaluation for common expression * There was no performance improvement when tested with ytdl-org/youtube-dl#30641 * e8de54bce50f6f77a4d7e8e80675f7003d5bf630 [core] Handle `/../` sequences in HTTP URLs * We plan to implement this differently
This comment was marked as off-topic.
This comment was marked as off-topic.
[YouTube] [core] Improve platform debug log, based on yt-dlp ytdl-org/youtube-dl@d1c6c5c Except: * 6ed34338285f722d0da312ce0af3a15a077a3e2a [jsinterp] Add short-cut evaluation for common expression * There was no performance improvement when tested with ytdl-org/youtube-dl#30641 * e8de54bce50f6f77a4d7e8e80675f7003d5bf630 [core] Handle `/../` sequences in HTTP URLs * We plan to implement this differently
Checklist
Verbose log
Description
Using
youtube_dl
with multiple threads to get information about multiple videos, is a lot slower after the last breakage.Using the code below gives me ~70 sec for 42 videos.
With the
yt-dlp
the time it takes for the same videos is ~30 sec.Before the last code change (I'm using the current git code),
youtube_dl
was faster thanyt-dlp
.Changing the
THREAD_NR
didn't change the difference..The processing is also much higher than before.
Because I use a similar strategy in an app I have, in my system (with an older i7) it got noticeable worst than before.
Trying it with an older laptop, made the app totally unusable..
The ids are some random links, you can use whatever you like.
The text was updated successfully, but these errors were encountered: