Client-side multifile zip download #9245

qqmyers · 2022-12-22T21:25:27Z

What this PR does / why we need it: A possible addition to/replacement for zipping on the server. In this PR, the multi-file download button invokes JavaScript that will download files individually (using direct download if enabled) and create a zip locally, using file names/directoryPaths from the specific datasetVersion being downloaded.

Current issues/limitations:

It isn't clear that this will work on all browsers
There's no error handling - should be possible, for example, to default to using the server side zip if things go wrong or if the browser type/version doesn't support what's needed.
It should be more efficient, but I've done minimal testing on scalability so far. Nominally, one could allow users to download all files and not have a size limit. The underlying zip code is using Blobs and Promises and is supposed to scale, but I'm not sure I'm doing everything to configure it to scale, etc.
The logic not allowing download of a zip when your over the size limit has not been changed, so this method is also subject to any limit so far.
The download file is always named dataverse_files.zip as before - could potentially use the dataset PID/version to create a unique name (with full or partial to indicate some/all files)
There is not currently any manifest file in the zip - should be possible to add one if desired (or to someday make a Bag)

Which issue(s) this PR closes:

Closes #5864

Special notes for your reviewer:
To enable this, I needed to know the datasetVersion in the download code which required trying to fix #5864 - the multifile button doesn't set the datasetVersion in the guestbook by default. If the rest gets delayed, it may be worth pulling out this one line fix (it's a separate commit).

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

- fix for IQSS#5864

coveralls · 2022-12-22T21:27:43Z

Coverage: 20.005% (-0.004%) from 20.009% when pulling bd13967 on GlobalDataverseCommunityConsortium:clientsidezip into 1aabf69 on IQSS:develop.

donsizemore · 2022-12-22T22:43:49Z

this is wonderful! does it default to original file format, or does it send surrogate copies?

qqmyers · 2022-12-23T11:33:55Z

Right now it is adding ?format=original when it retrieves each file.

donsizemore · 2023-01-10T15:40:17Z

@qqmyers @jdmar3 corrects me: From archival theory there is a case to be made either way, but I do argue that prioritizing original file formats over plaintext tabular data incurs tech debt and may make data unusable. Would perhaps an "original format" check-box or some such require much additional work?

jdmar3 · 2023-01-10T15:41:10Z

@qqmyers Would it be possible to present an option to add ?format=archival in addition to/instead of ?format=original. Also, we're working on some automated user testing for different browsers, so I would be happy to help with testing on different browsers if needed.

EDIT: @donsizemore beat me to the punch!

qqmyers · 2023-01-10T18:39:35Z

The Access Dataset menu at the top of the page allows getting either original or archival format. Currently I have not changed those buttons to use the client-size zipping but that's a useful addition once we know that it works well for most browsers and sizable datasets.

Both forms are also available at the individual file level, so it is mostly a limitation of the bulk 'Download' button for selected files, regardless of whether the existing server-side zipping or this client-side method is used. I don't want to change that as part of this PR for client-side zipping, but I think both client-size and the existing server-side algorithm could handle both cases if the user interface work is done to allow it. FWIW: I think the API call to download all files allows you to specify either form as well.

W.r.t. archiving, I would argue that the Bag exports are better than the zip available from the front end (the Bag has fixity info, all the metadata for the dataset, etc.) and since it is privileged, it doesn't run the risk of files being excluded if you don't have permissions (if I recall the zip options in the UI include a manifest that lists files that weren't included due to permissions or size limits). There has also been discussion of the archival Bag exports w.r.t. whether including the ingested formats would be better than the original, but there are issues that have slowed that work, e.g. the fact that Dataverse isn't storing the fixity info for the ingested versions. It would definitely be useful to have some discussion/review of the Bags to decide requirements and priorities.

W.r.t. to testing - thanks! The draft PR should work as is so if we can get a test server(s) set up somewhere, it could be tested with different browsers, larger data, etc. I think DataverseNO was going to try to fire one up, Don could probably do that at Odum as well. Assuming that looks promising, I can look into updating the download all buttons - that shouldn't involve any new risks - if it works for the one format, it will work for the other.

qqmyers · 2023-02-07T16:18:27Z

This is not ready for Review/QA (hence draft). Testing has show the local browser uses significant amounts of memory with large files and can fail with an out-of-memory error. I'm still investigating how to handle this. Perhaps it should not be on the board yet?

pdurbin · 2023-03-09T16:33:13Z

We're excited about it. Let's let Jim size it.

mreekie · 2023-03-14T15:32:23Z

Sizing:

Slid this back to Jim's column as not ready for sizing.

qqmyers added 3 commits December 21, 2022 14:36

initial attempt at zip download in client

b08b649

send datasetversion in all downloads

2892511

- fix for IQSS#5864

POC working with dir paths, draft version.

2d10936

qqmyers added the GDCC: DataverseNO label Dec 22, 2022

qqmyers added 2 commits January 17, 2023 09:14

Merge remote-tracking branch 'IQSS/develop' into clientsidezip

d51458c

Merge remote-tracking branch 'IQSS/develop' into clientsidezip

94d42a9

Merge remote-tracking branch 'IQSS/develop' into clientsidezip

bd13967

mreekie added Size: Queued PM has called this issue out specifically for sizing bklog: NeedsDiscussion labels Mar 14, 2023

pdurbin removed the bklog: NeedsDiscussion label Oct 7, 2023

pdurbin added the Type: Feature a feature request label Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client-side multifile zip download #9245

Client-side multifile zip download #9245

qqmyers commented Dec 22, 2022

coveralls commented Dec 22, 2022 •

edited

Loading

donsizemore commented Dec 22, 2022

qqmyers commented Dec 23, 2022

donsizemore commented Jan 10, 2023

jdmar3 commented Jan 10, 2023 •

edited

Loading

qqmyers commented Jan 10, 2023

qqmyers commented Feb 7, 2023

pdurbin commented Mar 9, 2023

mreekie commented Mar 14, 2023

Client-side multifile zip download #9245

Are you sure you want to change the base?

Client-side multifile zip download #9245

Conversation

qqmyers commented Dec 22, 2022

coveralls commented Dec 22, 2022 • edited Loading

donsizemore commented Dec 22, 2022

qqmyers commented Dec 23, 2022

donsizemore commented Jan 10, 2023

jdmar3 commented Jan 10, 2023 • edited Loading

qqmyers commented Jan 10, 2023

qqmyers commented Feb 7, 2023

pdurbin commented Mar 9, 2023

mreekie commented Mar 14, 2023

coveralls commented Dec 22, 2022 •

edited

Loading

jdmar3 commented Jan 10, 2023 •

edited

Loading