-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify uploads using ETag #2585
Comments
The problem is that for downloads there's no way of knowing how big each chunk is so we can't reliably reproduce etags. For uploads we send a Content-MD5 on each chunk that S3 verifies in addition to the protections TLS offers. Optionally, we can send a Content-SHA256 that S3 will also verify, but is less prone to collisions. You can set in the config file by using |
My reasoning was that this way we can checksum the whole file. That'd make sure that the whole splitting and joining was working as intended. You are right, for download, the only way to validate the file would be to start using some meta tag, like how rclone or s4cmd does it. I'd actually recommend this. // Just a quick note: do I understand right that there are 3 levels of hash checking for each chunk:
The linked docs isn't clear to me about when is and when 2. and 3. is enabled. |
MD5 is always present unless you remove the handler that adds it. We don't explicitly document this as a thing you can do, but it is possible. The default for SHA256 is a bit confusing. It is enabled by default for all non-streaming requests (streaming being uploading / downloading). For streaming requests it is disabled by default if all of the following conditions are met:
The I understand wanting a whole-file checksum, but I would not want to rely on undocumented behavior for content verification. The s3 docs only say that multipart upload etags are not MD5 digests. If S3 were to come out with an official algorithm I would support adding it in. The next best option is to do custom hashing using an explicit mechanism and putting that hash in the object metadata. This is off the table for us because we have a policy of never implicitly adding data, especially when it would cost money. While metadata doesn't cost additional money (iirc), you are pretty heavily restricted on how much each object can have (2kb). Implicitly sending up our own metadata would further limit how much can otherwise be provided. |
Thanks for the great explanation about MD5 and SHA256, most of the information from your comment would be great in the official help! For example that MD5 is always verified, chunk-by-chunk, by default in all scenarios. OK, I agree about not verifying the whole file if the ETag calculation isn't official. I'd be surprised if it was changed after so many years, but I understand your point. |
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Apparently the AWS CLI already validates MD5 checksums when down-/uploading files. By adding a flag in its config it does this explicitly, using a SHA256 checksum. Source: aws/aws-cli#2585 (comment) This makes our own implementation of hashing superfluous.
Based on StackOverflow answers, I wrote a Python implementation which correctly calculates both multi-part and single-part file ETags.
I've tested it on different file sizes the the calculation is correct. Would it be possible to implement it to
aws s3 cp
andaws s3 sync
functions as a verification step?The text was updated successfully, but these errors were encountered: