Download webpage after executing JavaScript scripts #30661

DarkFighterLuke · 2022-02-21T18:54:06Z

Checklist

I'm asking a question
I've looked through the README and FAQ for similar questions
I've searched the bugtracker for similar questions including closed ones

Question

I am trying to make a new extractor and I would like to know if (and eventually how) is possible to download a webpage after executing its own javascript scripts. This because of some fields (in my case the video url) which are loaded by the scripts and then, if you download the page without executing them (just like wget does), you'll get an empty url.

Thanks.

DarkFighterLuke · 2022-02-21T18:57:37Z

please provide example URL

https://sxyprn.com/post/5bb443225d5ae.html try this

dirkf · 2022-02-21T23:31:13Z

First, don't worry about that site. It's handled by the YourPorn extractor (or can be be with some tweaks that I have at hand).

As to the larger question that you ask, the answer is that the tools don't exist to enable it.

Actually, these days it's difficult enough to get even a browser that isn't the latest Chrome to handle general web JS. However, there are certain cases where we do need to run some specific JS for an extractor (so far, YouTube itself) and the JSInterpreter module enables that without introducing new dependencies; but it's always at the risk that some site for which it's used suddenly deploys a syntax that we haven't allowed for (or even new syntax that breaks all previous JS interpreters, like optional chaining or ES2018 regex literals).

Two extractors use the PhantomJS interpreter mentioned in the link, but that introduces platform and configuration problems, and I don't really know if it works since I don't have it installed.

DarkFighterLuke · 2022-02-22T16:00:17Z

Thank you for the answer @dirkf , it was all I implicitly wanted to know. I'm happy that is possible to deal with pages using JS using youtube-dl (and i should have known since YouTube uses it).

From your answer I see that an extractor for that site already exists and from what I tried it works.
I am curious about the tweaks you have in mind.

DarkFighterLuke added the question label Feb 21, 2022

dirkf closed this as completed Apr 18, 2022

dirkf mentioned this issue Aug 21, 2022

[options] Added workaround option to execute "n_function" #31187

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Download webpage after executing JavaScript scripts #30661

Download webpage after executing JavaScript scripts #30661

DarkFighterLuke commented Feb 21, 2022

DarkFighterLuke commented Feb 21, 2022

dirkf commented Feb 21, 2022

DarkFighterLuke commented Feb 22, 2022

Download webpage after executing JavaScript scripts #30661

Download webpage after executing JavaScript scripts #30661

Comments

DarkFighterLuke commented Feb 21, 2022

Checklist

Question

DarkFighterLuke commented Feb 21, 2022

dirkf commented Feb 21, 2022

DarkFighterLuke commented Feb 22, 2022