Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONTENT_DOWNLOAD_MISMATCH with successful file download #709

Closed
taiyokato opened this issue May 13, 2019 · 18 comments · Fixed by #1063 or #2023
Closed

CONTENT_DOWNLOAD_MISMATCH with successful file download #709

taiyokato opened this issue May 13, 2019 · 18 comments · Fixed by #1063 or #2023
Labels
api: storage Issues related to the googleapis/nodejs-storage API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@taiyokato
Copy link

Environment details

  • OS: Windows 10 Pro 1809
  • Node.js version: v10.15.3 LTS
  • npm version: 6.4.1
  • @google-cloud/storage version: 2.5.0

Steps to reproduce

  1. Prepare a bucket with Cloud KMS Customer managed Key encryption. All files are private.
  2. Upload an JPEG image file using:
function UploadFile() {
    storage.bucket("somebucket").upload("somefolder/someimage.jpg", {
        gzip: true,
        destination: "somefolder/someimage.jpg",
    }).then(results => {
        console.log("upload OK");
    })
    .catch(err => {
        console.error("Error: ", err);
    });
}
UploadFile()
// Upload works fine.
  1. Try downloading the img file using:
function DownloadFile() {    
    const file = storage.bucket("somebucket").file("somefolder/someimage.jpg");
    
    file.download({
        destination: "someimage.jpg",
        // validation: false, // Why even disable validation?
    }).then(res => {
        console.log("DL OK");
    }).catch(err => {
        console.error(err); // Throws CONTENT_DOWNLOAD_MISMATCH
    })
}
DownloadFile()
  1. Correct decompressed file is downloaded AND exception is thrown: code=CONTENT_DOWNLOAD_MISMATCH message=The downloaded data did not match the data from the server. To be sure the content is the same, you should download the file again.

I've read through Issue 566, but seems like not a solution.

storage.bucket("somebucket").file("somefolder/someimage.jpg").download({validation: false}); works, but there's no reason to or should disable validation.

To make sure if the hashes are actually a mismatch, I ran a local md5sum check on the downloaded and original image files.

$ md5sum downloaded.jpg original.jpg
a045a2e8e6b8d84aa8a319bcdba05419  downloaded.jpg
a045a2e8e6b8d84aa8a319bcdba05419  original.jpg

Downloaded file is a match to the original file.

BTW, This problem doesn't happen if the image is not gzipped on upload.

Thanks

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 13, 2019
@bcoe bcoe added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. needs work This is a pull request that needs a little love. and removed triage me I really want to be triaged. labels May 14, 2019
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label May 14, 2019
@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label May 20, 2019
@stephenplusplus stephenplusplus added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed triage me I really want to be triaged. labels May 21, 2019
@jiren
Copy link
Member

jiren commented Jun 4, 2019

Validation error because of crc32c check failure.

failed = !validateStream.test('crc32c', hashes.crc32c.substr(4));

  • On upload, storage client calculates crc32c after gzip compression. Google storage service is also calculating crc32c on gzip data and send back crc32c value in upload response for validation.
  • While on download google storage client receiving uncompress data but header x-goog-hash has the crc32c value of gzip compressed data. (same as generated on upload ). Here crc32c value is mismatching because the client is calculating the crc32c value on uncompressed data.

Same error if validation set to md5

const file = storage.bucket("Bucket Name").file("file path on google storage");
file.download({
  destination: "downloaded.jpg",
  validation: 'md5'
}).then(res => {
  console.log("DL OK");
}).catch(err => {
  console.error(err);
})

So if gzip response header not found in download response then we should ignore crc32c check on uncompressed data.

@yoshi-automation yoshi-automation removed the 🚨 This issue needs some love. label Jun 4, 2019
@jkwlui jkwlui assigned stephenplusplus and unassigned jkwlui Jul 18, 2019
@stephenplusplus
Copy link
Contributor

@taiyokato I have reproduced this when using already-compressed content, specifically a PNG. In your example, you use a JPEG, which is also already compressed. The Storage docs advise against this due to "undesired behaviors" (see Using gzip on compressed objects).

In this case, the server is responding with the actual image contents without the gzip wrapper. We don't run the decompression, because it isn't compressed data we are receiving. Validation still runs, however, and that means the hashes we compute are not against the gzipped data value, but the actual data.

@jiren Currently, we run validation on all downloads. Can you clarify the conditions we should bypass the validation?

@taiyokato
Copy link
Author

@stephenplusplus

Ah, that clears the mystery.
Thanks for looking into it (and the link).

Should I keep this issue open or close it?

@jiren
Copy link
Member

jiren commented Jul 22, 2019

Here is the current condition.

nodejs-storage/src/file.ts

Lines 1300 to 1303 in 71a4f59

if (shouldRunValidation) {
validateStream = hashStreamValidation({crc32c, md5});
throughStreams.push(validateStream);
}

This condition can be change as per data is compress or not. If the gzip header is present then only validate CRC and MD5.

if (shouldRunValidation && isCompressed) { 
  validateStream = hashStreamValidation({ crc32c, md5 }); 
  throughStreams.push(validateStream); 
} else { 
  crc32c = false 
}

Same thing is implemented in go storage client.

if length != 0 && !res.Uncompressed && !uncompressedByServer(res) {

https://github.com/googleapis/google-cloud-go/blob/71971b35976fc2f904ed2772536790a5458d9996/storage/reader.go#L205

@stephenplusplus
Copy link
Contributor

stephenplusplus commented Sep 3, 2019

@AVaksman I'll hand this over to you for now, as I believe this is blocked on the outcome of your investigation with the Storage team: https://docs.google.com/document/d/1jz91hfD5AJ8ghXnTPSnCuBa4NdkOPq085y9hBBQFu8M. I'm happy to do any implementation work if necessary when the time comes.

@stephenplusplus stephenplusplus removed the needs work This is a pull request that needs a little love. label Oct 3, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 9, 2019
@crwilcox crwilcox removed the 🚨 This issue needs some love. label Nov 14, 2019
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 14, 2019
@stephenplusplus
Copy link
Contributor

@AVaksman is there a game plan for how to address this?

@google-cloud-label-sync google-cloud-label-sync bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Jan 29, 2020
@danielwhatmuff
Copy link

@AVaksman @stephenplusplus - this bug has just caused us an issue in our production environment which we have had to workaround, any eta on a fix?

@jkwlui
Copy link
Member

jkwlui commented Feb 5, 2020

I think what we'd like to end up doing here is skipping validation if the file has the metadata Content-Encoding: gzip and the file is served decompressed by the server.

We would insert that logic after the getMetadata() call, which is when the client makes a second API call to get the checksums from the server.

@frankyn Does this temporary fix make sense?

@frankyn
Copy link
Member

frankyn commented Feb 5, 2020

That sounds like a good plan in the mean time.

Please move forward it and add a note that this is only temporary.

@crwilcox
Copy link
Contributor

crwilcox commented Feb 6, 2020

Opening this so we track the other part of the fix. Will mark as external as it won't likely happen in this repo.

@crwilcox crwilcox reopened this Feb 6, 2020
@crwilcox crwilcox added external This issue is blocked on a bug with the actual product. and removed 🚨 This issue needs some love. labels Feb 6, 2020
@danielwhatmuff
Copy link

Any eta on a fix for this? Upgraded to 4.3.1 but see the same issue.

@jkwlui
Copy link
Member

jkwlui commented Feb 18, 2020

We believe the issue should have been fixed with 4.3.1, so that's weird that it's still showing up for you @danielwhatmuff. I'd like to get more information -

Could you help us by providing:

  • sample code that reproduces the issue
  • the file type that's causing the issue?

I appreciate your patience and working with us to resolve this!

@stephenplusplus stephenplusplus removed their assignment Apr 1, 2020
@stephenplusplus
Copy link
Contributor

@frankyn do you know where this issue stands? I don't think there was anything left for us to do from this library.

@wayjake
Copy link

wayjake commented Sep 21, 2020

I'm running into this with some uploaded .png and .jpegs. I do not GZIP on upload at all. I tried the MD5 check to see if that cleared it up, but it didn't. The odd thing is that it only breaks when opening multiple read streams for different files to zip them. When I attempt to read the one file in a separate file, it works. For the life of me, the two sets of code appear the same.

@frankyn
Copy link
Member

frankyn commented Oct 7, 2020

We are still working with the backend team to resolve this issue correctly. We are working on a workaround and will post an update when available.

@kirillgroshkov
Copy link

Is there an update on this? The issue is affecting us.

@frankyn
Copy link
Member

frankyn commented Nov 19, 2021

News leading to closure of this issue: As of end of this week Cloud Storage API will always respect Accept-Encoding: gzip.

I'm closing this issue as we can now safely validate checksum because GCS respects not decompressing data server side before sending back to customers.

Thanks for your patience

@frankyn frankyn closed this as completed Nov 19, 2021
@frankyn
Copy link
Member

frankyn commented Nov 19, 2021

Spoke to soon, Rolled back the change so we will need to follow-up again when we have an update. Apologies, I jinxed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet