-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File download via Zipdownloader tool creates damaged archives? #11207
Comments
This is not a brand new installation of the standalone zipper, is it? - In other words, is this something that was/used to be working properly, then just stopped working? |
If you look at the first few bytes of the zip file, you see these 6 extra bytes above the normal zip header:
(the 6 bytes are "2000" followed by the 2-byte DOS newline, i.e. "2000\r\n") |
Thanks for the quick reply! Yes, it used to work properly in previous versions. We're not 100% sure it only stopped working after our upgrade to v6.5 from 6.3 (via 6.4), but as far as I know we changed nothing in the configuration of the script or access to it. I'll ask a colleague to have a look at the specific log you mentioned; hopefully, we'll know more soon. |
The Apache errorlog looks quite strange. For a single zip download, I get the same message almost 1,300 times:
|
And at least the head of the zip file is okay:
|
Well, it's clearly not okay - it has the 6 extra bytes added before the normal "PK" header. Lines 32 to 34 in 6be4f20
("2000" being hex for 8192). ... but, what appears to be happening in your case, instead of sending this byte stream generated by the zipper to the client as is, Apache (for whatever reason) decides to chunk-encode it again, so the client receives it with these extra header and closing bytes; which of course breaks the zip format. I can't imagine that the Dataverse-side upgrade from (to 6.5) could have anything to do with this. It would be more likely that it was a change in the version of Apache installed, or maybe a change in the Apache configuration (?). Looking at the headers from your zipper, you appear to be using the Apache version 2.4.62 and ours is 2.4.37 (we are also using the zipper, and not experiencing this issue). ... OK, that must have been too much unnecessary/not particularly useful information. In more practical terms, I can try and build a version of the zipper that does not apply the chunking encoding to the stream for you, and see if that fixes it. |
This is clearly dumped in the log for every 8KB of the output that the zipper produces. Whether this is actually a symptom of the zip stream issue you are experiencing, I'm not sure. It's been a while now, but I am having a recollection of our production admins reporting that the zipper was flooding the logs with some repeating message back when we first deployed it (even though it was working properly)... and we must have just addressed that by suppressing the log messages, by changing |
Okay. I was about to write that this also happens when I call the Zipper via the shell (with export QUERY_STRING=...). But then that is the intention.
But we have not changed anything in the Apache configuration itself. |
The zipper already has a |
… formatting in the readme file. #11207
Please try this experimental version: https://github.com/IQSS/dataverse/raw/refs/heads/11207-external-zipper-chunking-issue/scripts/zipdownload/target/zipdownloader-0.0.1-test.jar, with the
and see what happens? - It may or may not work; no promises. |
It works. Great. Thank you very much. |
Interesting. I'm going to operate under the assumption that this was due to a change in how newer versions of Apache handle content generated under cgi-bin. So, I'll get this new option merged in and I'll update the documentation accordingly; and add a release note explaining that instances using the tool may need the new version... That said, I'm not entirely sure how many other Dataverse installations, other than yours and ours, are in fact using this zipper tool at this point. |
What steps does it take to reproduce the issue?
What happens?
The extraction of files from the zip fails:
unzip -v dataverse_files.zip
Archive: dataverse_files.zip
warning [dataverse_files.zip]: 10342 extra bytes at beginning or within zipfile (attempting to process anyway)
Apparently, each file has a bad zipfile offset.
We tested the multi file download for different datasets, on different machines, checked the Payara and Apache logs but found nothing obvious there. The single file download (= not using the zipper tool) works perfectly.
To whom does it occur (all users, curators, superusers)?
All users.
Which version of Dataverse are you using?
v 6.5
Any related open or closed issues to this bug report?
I found none.
The text was updated successfully, but these errors were encountered: