Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download webpage after executing JavaScript scripts #30661

Closed
3 tasks done
DarkFighterLuke opened this issue Feb 21, 2022 · 3 comments
Closed
3 tasks done

Download webpage after executing JavaScript scripts #30661

DarkFighterLuke opened this issue Feb 21, 2022 · 3 comments
Labels

Comments

@DarkFighterLuke
Copy link

Checklist

  • I'm asking a question
  • I've looked through the README and FAQ for similar questions
  • I've searched the bugtracker for similar questions including closed ones

Question

I am trying to make a new extractor and I would like to know if (and eventually how) is possible to download a webpage after executing its own javascript scripts. This because of some fields (in my case the video url) which are loaded by the scripts and then, if you download the page without executing them (just like wget does), you'll get an empty url.

Thanks.

@DarkFighterLuke
Copy link
Author

please provide example URL

https://sxyprn.com/post/5bb443225d5ae.html try this

@dirkf
Copy link
Contributor

dirkf commented Feb 21, 2022

First, don't worry about that site. It's handled by the YourPorn extractor (or can be be with some tweaks that I have at hand).

As to the larger question that you ask, the answer is that the tools don't exist to enable it.

Actually, these days it's difficult enough to get even a browser that isn't the latest Chrome to handle general web JS. However, there are certain cases where we do need to run some specific JS for an extractor (so far, YouTube itself) and the JSInterpreter module enables that without introducing new dependencies; but it's always at the risk that some site for which it's used suddenly deploys a syntax that we haven't allowed for (or even new syntax that breaks all previous JS interpreters, like optional chaining or ES2018 regex literals).

Two extractors use the PhantomJS interpreter mentioned in the link, but that introduces platform and configuration problems, and I don't really know if it works since I don't have it installed.

@DarkFighterLuke
Copy link
Author

Thank you for the answer @dirkf , it was all I implicitly wanted to know. I'm happy that is possible to deal with pages using JS using youtube-dl (and i should have known since YouTube uses it).

From your answer I see that an extractor for that site already exists and from what I tried it works.
I am curious about the tweaks you have in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants