Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Refactor JavaScript usage and fix extraction of obfuscated signature deobfuscation function #1108

Merged
merged 7 commits into from
Sep 22, 2023

Conversation

AudricV
Copy link
Member

@AudricV AudricV commented Sep 18, 2023

  • I carefully read the contribution guidelines and agree to them.
  • I have tested the API against NewPipe.
  • I agree to create a pull request for NewPipe as soon as possible to make it compatible with the changed API.

This PR refactors the usage of YouTube's JavaScript base player file in the extractor.

It makes a single public class handling operations and caches of this file, YoutubeJavaScriptPlayerManager, instead of being split in multiple files with questionable designs (such as the one of the deobfuscation function for obfuscated signatures of streaming URLs from HTML5 clients in YoutubeStreamExtractor).

Extraction and parsing parts are delegated to different package-private classes:

  • YoutubeJavaScriptExtractor, handling the extraction of JavaScript's base player file;
  • YoutubeThrottlingParameterUtils, handling the extraction of throttling parameter of streaming URLs from HTML5's clients and the detection of this parameter for a given streaming URL;
  • YoutubeSignatureUtils, handling the extraction of the signature timestamp property and the deobfuscation function for obfuscated signatures of streaming URLs from HTML5 clients.

Each extraction process has now a corresponding test.

It also improves the code on the fly, fixes the extraction of the deobfuscation function for obfuscated signatures, and makes use of a number when sending the signature timestamp in InnerTube's player requests instead of a string, in order to be consistent with HTML5 clients.

This PR introduces important breaking changes:

  • YoutubeJavaScriptExtractor is not public anymore, the extraction of YouTube's JavaScript base player file isn't intended to be used outside of the extractor;
  • resetDeobfuscationCode method of YoutubeStreamExtractor has been removed. Use the method clearAllCaches of YoutubeJavaScriptPlayerManager instead, which also clears other cached data;
  • YoutubeThrottlingDecrypter has been renamed to YoutubeThrottlingParameterUtils, which is a package-private class and the methods getCacheSize and clearCache have been removed. Use respectively getThrottlingParametersCacheSize and clearThrottlingParametersCache methods of YoutubeJavaScriptPlayerManager instead.
  • DeobfuscateException nested exception class in YoutubeStreamExtractor has been removed. A standard ParsingException will be thrown when the deobfuscation function couldn't be parsed or executed instead.

Fixes TeamNewPipe/NewPipe#10347.

@AudricV AudricV added bug Issue is related to a bug enhancement New feature or request youtube service, https://www.youtube.com/ codequality Improvements to the codebase to improve the code quality labels Sep 18, 2023
Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! The code is so much better and well organized. I pointed out a few small things I notices

…uscation function extraction

The goal of this class is to decouple the extraction of signature timestamp and
signature deobfuscation function from YoutubeStreamExtractor.

The extraction of the signature deobfuscation function has been also adapted to
support the latest YouTube player versions.

This new class, YoutubeSignatureUtils, doens't store anything temporary such as
a copy of the player code, which has to be passed where required. It is not
public, as it will be used by a JavaScript player manager class in the future,
in order to handle in a better way fetching, caching and resetting cache of the
player code.
This commit is introducing breaking changes.

For clients, everything is managed in a new class called
YoutubeJavaScriptPlayerManager:
- caching JavaScript base player code and its extracted code (functions and
variables);
- getting player signature timestamp;
- getting deobfuscated signatures of streaming URLs;
- getting streaming URLs with a throttling parameter deobfuscated, if
applicable.

The class delegates the extraction parts to external package-private classes:
- YoutubeJavaScriptExtractor, to extract and download YouTube's JavaScript base
player code: it always already present before and has been edited to mainly
remove the previous caching system and made it package-private;
- YoutubeSignatureUtils, for player signature timestamp and signature
deobfuscation function of streaming URLs, added in a recent commit;
- YoutubeThrottlingParameterUtils, which was originally
YoutubeThrottlingDecrypter, for throttling parameter of streaming URLs
deobfuscation function and checking whether this parameter is in a streaming
URL.

YoutubeJavaScriptPlayerManager caches and then runs the extracted code if it
has been executed successfully. The cache system of throttling parameters
deobfuscated values has been kept, its size can be get using the
getThrottlingParametersCacheSize method and can be cleared independently using
the clearThrottlingParametersCache method.

If an exception occurs during the extraction or the parsing of a function
property which is not related to JavaScript base player code fetching, it is
stored until caches are cleared, making subsequent failing extraction calls of
the requested function or property faster and consuming less resources, as the
result should be the same until the base player code changes.

All caches can be reset using the clearAllCaches method of
YoutubeJavaScriptPlayerManager.

Classes using JavaScript base player code and utilities directly (in the code
and its tests) have been also updated in this commit.
The signature timestamp is used as a number by HTML5 clients, so it should be
used in the same way by the extractor too instead of being a string.

As the timestamp doesn't seem to exceed 5 digits, an integer is used to store
its value.
…deobfuscation function extraction and execution
Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@Stypox
Copy link
Member

Stypox commented Sep 22, 2023

I can confirm it works, I tested in NewPipe

@Stypox Stypox merged commit 289db11 into TeamNewPipe:dev Sep 22, 2023
1 check passed
@AudricV AudricV deleted the yt_refactor-js-usage branch September 22, 2023 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug codequality Improvements to the codebase to improve the code quality enhancement New feature or request youtube service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[YouTube] Can't play or download age-restricted music videos
2 participants