Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[options] Added workaround option to execute "n_function" #31187

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

u-n-k-n-o-w-n
Copy link

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

In regards to the recent youtube "n_function" update, if another fancy way comes up that the current jsinterp can't handle, this patch can give users a workaround until it is fixed in jsinterp.
But of course, you need to have selenium installed($ pip install selenium) and webdriver for the browser you want to use. :p

  • [options] Added workaround option to execute "n_function"

@dirkf
Copy link
Contributor

dirkf commented Aug 20, 2022

If we're doing this, let's align it with the PhantomJS support in yt-dlp so that either package can be used. Other extractors might have a use for a JS interpreter (currently PhantomJS has a class in the OpenLoad extractor module).

There should be a function that runs a JS function and returns the result plus some way of determining which "engine" to use, based on options and defaults.

What do you think?

@u-n-k-n-o-w-n
Copy link
Author

You mean to create something like WebDriverJSwrapper?
I think it is a good idea to make the parts universally usable. :)

P.S.
In my personal opinion, I would not recommend using phantomjs.

phantomjs is no longer maintained, and the executable that can be downloaded from the official website is statically linked to the 2016 webkit. In other words, the webkit vulnerabilities discovered since 2016 have not been fixed. Executing externally obtained JS is inherently dangerous, but even more so with a JS engine like this one.

Before that, phantomjs can read and write files and execute processes from JS, so the game is over when you load externally obtained JS. I don't think youtube would do such a thing, but I am not confident enough to say that all video sites would not do it.

When using selenium's webdriver, the JS engine is usually the browser the user normally uses, so there should be no increased risk. Also, even the average user is likely to update their browser, so it is always guaranteed that the latest JS engine can be used with the same level of security as the browser. For example, if a new JS syntax is defined, you do not need to consider it as long as you use the browser with webdriver.

Anyway, if you are going to use an external JS engine, you should use one that is maintained. :p

@dirkf
Copy link
Contributor

dirkf commented Aug 21, 2022

The lack of a plausible alternative is why we have the built-in mini-JS interpreter. In some applications there is no "browser the user normally uses".

Also mentioned here is this more plausible https://github.com/amol-/dukpy

@u-n-k-n-o-w-n
Copy link
Author

Yes, the best thing to do is to continually improve jsinperp so that it can do whatever the browser can do.
The original intent of this patch was

this patch can give users a workaround until it is fixed in jsinterp.

so I totally agree with you on that point.

You mean to create something like WebDriverJSwrapper?
I think it is a good idea to make the parts universally usable. :)

Implemented at 3038610.

Comment on lines +1513 to +1696
f = ('return ((e) => {{'
'const d = decodeURIComponent(e);'
'const p = d.lastIndexOf("}}");'
'const th = d.substring(0, p);'
'const bh = d.substring(p);'
'const m = "var {0};" + th + ";{0} = {1};" + bh;'
'const s = document.createElement("script");'
's.innerHTML = m;'
'document.body.append(s);'
'return {0}("{2}");'
'}})("{3}");').format(dummyfunc, funcname, n_param, compat_urllib_quote(jscode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this run the entire js?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, when the JS is loaded, an anonymous function is called with _yt_player as an argument.
There is a lot of waste, but this is what the browser normally does when browsing youtube.
Also, the JS that allows the desired function to be referenced from the outside is inserted at the time of this call, so there is no need for complicated parsing. :)

@pukkandan
Copy link
Contributor

If we want to use an external binary, wouldn't node be better than selenium?

@u-n-k-n-o-w-n
Copy link
Author

u-n-k-n-o-w-n commented Sep 1, 2022

@pukkandan
With selenium, the browser's JS engine can be used simply by installing webdriver.
Most people probably use browsers as well, so there is often no need to go to the trouble of installing another JS engine.

In addition, since nodejs was originally designed for writing regular programs in JS, it does not seem like a good idea to pass externally derived JS to be executed(child_process, fs, etc...).
But I am not familiar with nodejs, so if nodejs can run as safely as the various restrictions that browsers impose on JS engines, then we would not need to consider this issue.

@pukkandan
Copy link
Contributor

pukkandan commented Sep 1, 2022

With selenium, the browser's JS engine can be used simply by installing webdriver.
Most people probably use browsers as well, so there is often no need to go to the trouble of installing another JS engine.

youtube-dl is often used in servers/containers/embedded devices where a browser is not normally available. And installing chrome/firefox on a headless system is not trivial

PS: Also, puppeteer playwright is generally preferred over selenium nowadays

@u-n-k-n-o-w-n
Copy link
Author

youtube-dl is often used in servers/containers/embedded devices where a browser is not normally available. And installing chrome/firefox on a headless system is not trivial

Yes, so this is only an workaround.
My goal is not to help all users, but to provide workarounds, even if only to a subset of users.

PS: Also, puppeteer is generally preferred over selenium nowadays

Thanks, is this it?
I'll check it out. :)

youtube_dl/options.py Outdated Show resolved Hide resolved
Co-authored-by: Jouni Järvinen <rautamiekka@users.noreply.github.com>
@dirkf
Copy link
Contributor

dirkf commented Jul 12, 2024

Now that Deno (addressing security issues associated with node.js) exists, I'm more favourably inclined to this functionality. If we can ensure that it works like yt-dlp (which does support Deno), it should be merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants