Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6505 optimize zip downloads #6986

Merged
merged 33 commits into from
Jul 16, 2020
Merged

6505 optimize zip downloads #6986

merged 33 commits into from
Jul 16, 2020

Conversation

landreev
Copy link
Contributor

What this PR does / why we need it:

The much debated "zipper service" - an experimental way to take zipped file downloads (extra long running jobs by design) outside of the main Application Service.

Which issue(s) this PR closes:

Closes #6505

Special notes for your reviewer:

Suggestions on how to test this:

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

landreev added 7 commits May 29, 2020 10:50
…ate the

external "custom download" service. Everything else is done by an outside
standalone program (a java program with its own pom file). (#6505)
still working on the documentation, so will need to check it in later.
added some info to the documentation explaining how the zipper does its thing. (#6505)
(fixed merge conflicts w/develop - mostly the POST handling added for the /api/access/datafiles/ API)
@coveralls
Copy link

coveralls commented Jun 23, 2020

Coverage Status

Coverage decreased (-0.03%) to 19.627% when pulling 2553845 on 6505-optimize-zip-downloads into 6daf219 on develop.

@djbrooke djbrooke added this to the Dataverse 5 milestone Jun 24, 2020
@pdurbin pdurbin self-assigned this Jun 24, 2020
Copy link
Member

@pdurbin pdurbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great and like I said to @landreev, it even works! 😄

@kcondon the testing I did was minimal.

I left a variety of comments.

doc/sphinx-guides/source/installation/advanced.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/advanced.rst Outdated Show resolved Hide resolved
doc/sphinx-guides/source/installation/advanced.rst Outdated Show resolved Hide resolved
scripts/zipdownload/README.md Outdated Show resolved Hide resolved
scripts/zipdownload/README.md Show resolved Hide resolved
scripts/zipdownload/pom.xml Outdated Show resolved Hide resolved
@@ -0,0 +1,2 @@
-- maybe temporary? - work in progress
CREATE TABLE IF NOT EXISTS CUSTOMZIPSERVICEREQUEST (KEY VARCHAR(63), STORAGELOCATION VARCHAR(255), FILENAME VARCHAR(255), ISSUETIME TIMESTAMP);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use our usual @Entity convention and have JPA create and manage this table? And remove this SQL script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was on purpose - I wanted this table to live outside what's managed by JPA; and I didn't want to make its entries entities.
In part because the table is going to be modified externally (the zipper job cleaning after itself and deleting the entries for the jobs it has served). But also to emphasize the custom-ness and hacky-ness of this setup?

@pdurbin pdurbin removed their assignment Jun 24, 2020
@landreev
Copy link
Contributor Author

@pdurbin Thanks again - as I said earlier, that was super useful feedback; including the things we discussed off github.
Will address and move it back tomorrow.

@kcondon
Copy link
Contributor

kcondon commented Jul 9, 2020

Got tripped up by selinux, will review docs
Need to expand to account for multiple storage types

@kcondon kcondon assigned landreev and unassigned kcondon Jul 9, 2020
@landreev
Copy link
Contributor Author

landreev commented Jul 9, 2020

Yes, the selinux thing does deserve to be mentioned specifically - it can be confusing.
Will commit the fixes shortly.

Got tripped up by selinux, will review docs
Need to expand to account for multiple storage types

@landreev
Copy link
Contributor Author

landreev commented Jul 9, 2020

Added some extra text to the instruction, as discussed.
Added support for multiple stores of the same type.

@kcondon kcondon assigned kcondon and unassigned landreev Jul 9, 2020
@kcondon
Copy link
Contributor

kcondon commented Jul 10, 2020

We are seeing a browser-specific issue: Download works with FF but fails with Chrome and Microsoft Edge with download failed due to network issue. There is a warning that appears in the browser console of failing browsers (not FF):
core.js.xhtml?ln=primefaces&v=8.0:18 Resource interpreted as Document but transferred with MIME type application/zip: "https://dataverse-internal.iq.harvard.edu/cgi-bin/zipdownload?5a5-2b1c2c0d5f4d".
doRedirect @ core.js.xhtml?ln=primefaces&v=8.0:18
handle @ core.js.xhtml?ln=primefaces&v=8.0:18
(anonymous) @ core.js.xhtml?ln=primefaces&v=8.0:18
c @ jquery.js.xhtml?ln=primefaces&v=8.0:2
fireWith @ jquery.js.xhtml?ln=primefaces&v=8.0:2
l @ jquery.js.xhtml?ln=primefaces&v=8.0:2
(anonymous) @ jquery.js.xhtml?ln=primefaces&v=8.0:2
load (async)
send @ jquery.js.xhtml?ln=primefaces&v=8.0:2
ajax @ jquery.js.xhtml?ln=primefaces&v=8.0:2
send @ core.js.xhtml?ln=primefaces&v=8.0:18
offer @ core.js.xhtml?ln=primefaces&v=8.0:18
handle @ core.js.xhtml?ln=primefaces&v=8.0:18
PrimeFaces.ab @ core.js.xhtml?ln=primefaces&v=8.0:18
onclick @ dataset.xhtml?persistentId=doi:10.70122/FK2/PLD80W:1

There is no server log error

@kcondon kcondon assigned landreev and unassigned kcondon Jul 13, 2020
@mheppler mheppler assigned mheppler and unassigned landreev Jul 14, 2020
@landreev
Copy link
Contributor Author

@kcondon

We are seeing a browser-specific issue: Download works with FF but fails with Chrome and Microsoft Edge with download failed due to network issue.

Turned out, the headers were fine as they were. It was a small error in how I was formatting the "chunks", in the "chunked encoding". It's just that Firefox, among others, happens to be more forgiving, and was willing to accept and decode the stream anyway. And Chrome is, apparently, a stickler for the rules.
Was a one line fix after all.

@landreev landreev assigned kcondon and unassigned mheppler Jul 14, 2020
@kcondon
Copy link
Contributor

kcondon commented Jul 14, 2020

Do I need a new download zipper jar file to see the change and not just another build of dataverse?

@landreev
Copy link
Contributor Author

@kcondon

Do I need a new download zipper jar file to see the change and not just another build of dataverse?

No Dataverse changes, only the zipper .jar file. That can be obtained on dvn-build.hmdc.harvard.edu again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Zipping Process on the backend
6 participants