Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Improve and fix YoutubeJavaScriptExtractor #1087

Merged

Conversation

AudricV
Copy link
Member

@AudricV AudricV commented Aug 2, 2023

This PR improves and fixes a part of the extraction of YouTube's JavaScript base player file, by doing the following changes:

  • Enhance the class' documentation;
  • Fix the regular expression fallback on HTML embed watch page;
  • Use HTML scripts tag search first instead of the regular expression approach, now used as a last resort;
  • Compile regular expressions only once, in order to improve the performance of subsequent extraction calls when clearing the cache;
  • Provide original exceptions when fetching or parsing pages on which the base JavaScript's player could be found failed, allowing clients to detect network errors when they are the cause of the failures for instance;
  • Remove delegate method which was not taking a video ID and hardcoding one, as we can provide the video ID in all cases or do not provide a video ID at worse;
  • Rename and make extraction methods package-private, as they are not intended to be used publicly.

The breaking changes introduced, which are internal to the extractor (the methods changed were not expected to be used by clients), have been applied where needed, in YoutubeJavaScriptExtractorTest and YoutubeStreamExtractor (in which an unneeded initStsFromPlayerJsIfNeeded call has been removed).

@AudricV AudricV added bug Issue is related to a bug youtube service, https://www.youtube.com/ labels Aug 2, 2023
@AudricV AudricV requested review from TheAssassin and removed request for TheAssassin August 2, 2023 20:32
- Enhance documentation;
- Fix the regular expression fallback on HTML embed watch page;
- Use HTML scripts tag search first instead of the regular expression approach,
now used as a last resort;
- Compile regular expressions only once, in order to improve the performance of
subsequent extraction calls when clearing the cache;
- Provide original exceptions when fetching or parsing pages on which the base
JavaScript's player could be found failed, allowing clients to detect network
errors when they are the cause of the failures for instance;
- Remove delegate method which was not taking a video ID and hardcoding one, as
we can provide the video ID in all cases or do not provide a video ID at worse;
- Rename and make extraction methods package-private, as they are not intended
to be used publicly.

These breaking internal changes have been applied where needed, in
YoutubeJavaScriptExtractorTest and YoutubeStreamExtractor (in which an unneeded
initStsFromPlayerJsIfNeeded call have been removed).
@AudricV AudricV force-pushed the yt_js-extractor-improvements-and-fixes branch from 8d5df9c to a3d160e Compare August 2, 2023 21:05
@AudricV AudricV marked this pull request as ready for review August 2, 2023 21:36
@Stypox Stypox merged commit 3faaf43 into TeamNewPipe:dev Aug 6, 2023
1 check failed
@AudricV AudricV deleted the yt_js-extractor-improvements-and-fixes branch August 6, 2023 10:02
Stypox added a commit to Stypox/NewPipeExtractor that referenced this pull request Aug 6, 2023
Stypox added a commit that referenced this pull request Aug 6, 2023
[YouTube] Update stream mocks after #1087
beasonshu pushed a commit to beasonshu/NewPipeExtractor that referenced this pull request Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug youtube service, https://www.youtube.com/
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants