Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for page to render before taking screenshot #446

Closed
peterchanws opened this issue May 31, 2022 · 0 comments · Fixed by #483
Closed

Wait for page to render before taking screenshot #446

peterchanws opened this issue May 31, 2022 · 0 comments · Fixed by #483
Assignees
Labels
web archiving for June-July 2022 work cycle

Comments

@peterchanws
Copy link

peterchanws commented May 31, 2022

Here are PURLs that show thumbnails that have not been generated properly as part of the wasSeedPreassemblyWF workflow:

If you look closely you can see that there is a "loading" message being displayed instead of the page content.

default

Instead of navigating to the page and taking a screenshot immediately we should give the page some time to load. Otherwise we might get a loading page, a partial screenshot or no screenshot at all.

In screenshot.js we can try using gotoUrls's timeout and/or waitUntil options. Although we will want to upgrade from puppeteer v3 (which is a few years old) to v14.

Testing this can be difficult because some URLs may have loaded slowly for various reasons, and may load differently from pywb compared to openwayback. We will mostly want to ensure that the process takes longer, and that it is not broken.

Below are are some instructions for creating a Seed Object in Argo to help in testing:

To test you will need to create a Seed Object and see what thumbnail gets created. The steps for creating a Web Seed Object in Argo are, in the menu select Register and then Register Items. Then fill out the form with the following values:

Admin-Policy: Web Archive Seed Object
Collection: Test Web Archive
View Access: World
Initial Workflow: wasSeedPreassemblyWF
Content-Type: webarchive-seed

Add a row, and enter:

Source ID: sul:someuniquevalue
Label:

Then click Register and view the DRUID that has been created in Argo by searching for it.

@ndushay ndushay added the web archiving for June-July 2022 work cycle label Jun 1, 2022
@edsu edsu transferred this issue from sul-dlss-deprecated/swap Jun 1, 2022
@edsu edsu changed the title Examples of thumbnail not generated properly Wait for page to render before screenshot Jun 17, 2022
@edsu edsu changed the title Wait for page to render before screenshot Wait for page to render before taking screenshot Jun 17, 2022
@edsu edsu self-assigned this Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
web archiving for June-July 2022 work cycle
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants