6505 optimize zip downloads #6986

landreev · 2020-06-16T13:11:48Z

What this PR does / why we need it:

The much debated "zipper service" - an experimental way to take zipped file downloads (extra long running jobs by design) outside of the main Application Service.

Which issue(s) this PR closes:

Closes #6505

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

…ate the external "custom download" service. Everything else is done by an outside standalone program (a java program with its own pom file). (#6505)

still working on the documentation, so will need to check it in later.

added some info to the documentation explaining how the zipper does its thing. (#6505)

…I as well.(#6505)

…ecutable side. (#6505)

(fixed merge conflicts w/develop - mostly the POST handling added for the /api/access/datafiles/ API)

coveralls · 2020-06-23T01:37:38Z

Coverage decreased (-0.03%) to 19.627% when pulling 2553845 on 6505-optimize-zip-downloads into 6daf219 on develop.

…section (#6505)

pdurbin

Overall, this looks great and like I said to @landreev, it even works! 😄

@kcondon the testing I did was minimal.

I left a variety of comments.

doc/sphinx-guides/source/installation/advanced.rst

scripts/zipdownload/README.md

scripts/zipdownload/pom.xml

...nload/src/main/java/edu/harvard/iq/dataverse/custom/service/download/ZipDownloadService.java

pdurbin · 2020-06-24T20:54:25Z

src/main/resources/db/migration/V4.20.0.4__6505-zipdownload-jobs.sql

@@ -0,0 +1,2 @@
+-- maybe temporary? - work in progress
+CREATE TABLE IF NOT EXISTS CUSTOMZIPSERVICEREQUEST (KEY VARCHAR(63), STORAGELOCATION VARCHAR(255), FILENAME VARCHAR(255), ISSUETIME TIMESTAMP);


Should we use our usual @Entity convention and have JPA create and manage this table? And remove this SQL script?

That was on purpose - I wanted this table to live outside what's managed by JPA; and I didn't want to make its entries entities.
In part because the table is going to be modified externally (the zipper job cleaning after itself and deleting the entries for the jobs it has served). But also to emphasize the custom-ness and hacky-ness of this setup?

src/main/java/edu/harvard/iq/dataverse/FileDownloadServiceBean.java

landreev · 2020-06-25T01:54:44Z

@pdurbin Thanks again - as I said earlier, that was super useful feedback; including the things we discussed off github.
Will address and move it back tomorrow.

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

…tom/service/download/ZipDownloadService.java Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

…/dataverse into 6505-optimize-zip-downloads

…uction

…ced guide; #6505

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

…/dataverse into 6505-optimize-zip-downloads

kcondon · 2020-07-09T16:56:27Z

Got tripped up by selinux, will review docs
Need to expand to account for multiple storage types

landreev · 2020-07-09T16:59:52Z

Yes, the selinux thing does deserve to be mentioned specifically - it can be confusing.
Will commit the fixes shortly.

Got tripped up by selinux, will review docs
Need to expand to account for multiple storage types

landreev · 2020-07-09T20:18:41Z

Added some extra text to the instruction, as discussed.
Added support for multiple stores of the same type.

kcondon · 2020-07-10T13:43:42Z

We are seeing a browser-specific issue: Download works with FF but fails with Chrome and Microsoft Edge with download failed due to network issue. There is a warning that appears in the browser console of failing browsers (not FF):
core.js.xhtml?ln=primefaces&v=8.0:18 Resource interpreted as Document but transferred with MIME type application/zip: "https://dataverse-internal.iq.harvard.edu/cgi-bin/zipdownload?5a5-2b1c2c0d5f4d".
doRedirect @ core.js.xhtml?ln=primefaces&v=8.0:18
handle @ core.js.xhtml?ln=primefaces&v=8.0:18
(anonymous) @ core.js.xhtml?ln=primefaces&v=8.0:18
c @ jquery.js.xhtml?ln=primefaces&v=8.0:2
fireWith @ jquery.js.xhtml?ln=primefaces&v=8.0:2
l @ jquery.js.xhtml?ln=primefaces&v=8.0:2
(anonymous) @ jquery.js.xhtml?ln=primefaces&v=8.0:2
load (async)
send @ jquery.js.xhtml?ln=primefaces&v=8.0:2
ajax @ jquery.js.xhtml?ln=primefaces&v=8.0:2
send @ core.js.xhtml?ln=primefaces&v=8.0:18
offer @ core.js.xhtml?ln=primefaces&v=8.0:18
handle @ core.js.xhtml?ln=primefaces&v=8.0:18
PrimeFaces.ab @ core.js.xhtml?ln=primefaces&v=8.0:18
onclick @ dataset.xhtml?persistentId=doi:10.70122/FK2/PLD80W:1

There is no server log error

…rom working in some browsers. (#6505)

landreev · 2020-07-14T17:46:52Z

@kcondon

We are seeing a browser-specific issue: Download works with FF but fails with Chrome and Microsoft Edge with download failed due to network issue.

Turned out, the headers were fine as they were. It was a small error in how I was formatting the "chunks", in the "chunked encoding". It's just that Firefox, among others, happens to be more forgiving, and was willing to accept and decode the stream anyway. And Chrome is, apparently, a stickler for the rules.
Was a one line fix after all.

kcondon · 2020-07-14T19:12:16Z

Do I need a new download zipper jar file to see the change and not just another build of dataverse?

landreev · 2020-07-14T19:48:26Z

@kcondon

Do I need a new download zipper jar file to see the change and not just another build of dataverse?

No Dataverse changes, only the zipper .jar file. That can be obtained on dvn-build.hmdc.harvard.edu again.

landreev added 7 commits May 29, 2020 10:50

The (very limited) changes that went into the application to accommod…

b8d268a

…ate the external "custom download" service. Everything else is done by an outside standalone program (a java program with its own pom file). (#6505)

components of the standalone zipper (#6505).

8b1765a

still working on the documentation, so will need to check it in later.

handling of folders added to the zipper;

e3973d1

added some info to the documentation explaining how the zipper does its thing. (#6505)

cosmetic (#6505)

ad1787a

The modifications allowing the use of the "custom zipper" with the AP…

1dc597b

…I as well.(#6505)

uncommented the line that cleans the request table, on the service ex…

ddfc88c

…ecutable side. (#6505)

Merge branch 'develop' into 6505-optimize-zip-downloads

5402e07

(fixed merge conflicts w/develop - mostly the POST handling added for the /api/access/datafiles/ API)

landreev added 2 commits June 22, 2020 22:11

a release note for the "zipper tool". (#6505)

aa923ba

added a section on the zipper service to the "installation/advanced" …

5d27982

…section (#6505)

djbrooke added this to the Dataverse 5 milestone Jun 24, 2020

pdurbin self-assigned this Jun 24, 2020

pdurbin requested changes Jun 24, 2020

View reviewed changes

pdurbin removed their assignment Jun 24, 2020

pdurbin assigned landreev Jun 24, 2020

adding new setting to release notes

c99fa60

landreev and others added 11 commits June 26, 2020 12:07

Update scripts/zipdownload/README.md

8dfe4c4

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

Update scripts/zipdownload/src/main/java/edu/harvard/iq/dataverse/cus…

46584da

…tom/service/download/ZipDownloadService.java Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

Better/safer handling of database queries (#6505)

5aaaff5

Merge branch '6505-optimize-zip-downloads' of https://github.com/IQSS…

48a56df

…/dataverse into 6505-optimize-zip-downloads

added a line about the Apache configuration to the installation instr…

3eb3976

…uction

line breaks in the readme (#6505)

1cd8629

small addition to the guide on installation (#6505)

69297fb

documents the zipper setting. (#6505)

6e2e396

fixes "original" always being true (#6505)

72394a4

removed unnecessary repos from pom.xml; a few more words in the advan…

96c3708

…ced guide; #6505

Update doc/sphinx-guides/source/installation/advanced.rst

e01c213

Co-authored-by: Philip Durbin <philip_durbin@harvard.edu>

Merge branch '6505-optimize-zip-downloads' of https://github.com/IQSS…

2fcdfac

…/dataverse into 6505-optimize-zip-downloads

kcondon assigned landreev and unassigned kcondon Jul 9, 2020

landreev added 2 commits July 9, 2020 15:45

added support for multiple file stores (#6505)

757a120

extra words in the doc #6505

7f2bf94

kcondon assigned kcondon and unassigned landreev Jul 9, 2020

kcondon assigned landreev and unassigned kcondon Jul 13, 2020

mheppler assigned mheppler and unassigned landreev Jul 14, 2020

Fixed the chunking encoding error, that was preventing the download f…

2553845

…rom working in some browsers. (#6505)

landreev assigned kcondon and unassigned mheppler Jul 14, 2020

kcondon merged commit 875374f into develop Jul 16, 2020

kcondon deleted the 6505-optimize-zip-downloads branch July 16, 2020 19:24

djbrooke mentioned this pull request Aug 3, 2020

No warning when files not included because of download size limit #7157

Closed

poikilotherm mentioned this pull request Feb 3, 2022

Code Infrastructure: create a Maven Parent POM and integrate existing #8394

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

6505 optimize zip downloads #6986

6505 optimize zip downloads #6986

landreev commented Jun 16, 2020

coveralls commented Jun 23, 2020 •

edited

Loading

pdurbin left a comment

pdurbin Jun 24, 2020

landreev Jun 26, 2020

landreev commented Jun 25, 2020

kcondon commented Jul 9, 2020

landreev commented Jul 9, 2020

landreev commented Jul 9, 2020

kcondon commented Jul 10, 2020 •

edited

Loading

landreev commented Jul 14, 2020

kcondon commented Jul 14, 2020 •

edited by landreev

Loading

landreev commented Jul 14, 2020

		@@ -0,0 +1,2 @@
		-- maybe temporary? - work in progress
		CREATE TABLE IF NOT EXISTS CUSTOMZIPSERVICEREQUEST (KEY VARCHAR(63), STORAGELOCATION VARCHAR(255), FILENAME VARCHAR(255), ISSUETIME TIMESTAMP);

6505 optimize zip downloads #6986

6505 optimize zip downloads #6986

Conversation

landreev commented Jun 16, 2020

coveralls commented Jun 23, 2020 • edited Loading

pdurbin left a comment

Choose a reason for hiding this comment

pdurbin Jun 24, 2020

Choose a reason for hiding this comment

landreev Jun 26, 2020

Choose a reason for hiding this comment

landreev commented Jun 25, 2020

kcondon commented Jul 9, 2020

landreev commented Jul 9, 2020

landreev commented Jul 9, 2020

kcondon commented Jul 10, 2020 • edited Loading

landreev commented Jul 14, 2020

kcondon commented Jul 14, 2020 • edited by landreev Loading

landreev commented Jul 14, 2020

coveralls commented Jun 23, 2020 •

edited

Loading

kcondon commented Jul 10, 2020 •

edited

Loading

kcondon commented Jul 14, 2020 •

edited by landreev

Loading