-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multipart uploads can lead to loss of metadata #1145
Comments
This behavior is known. If I were to guess, the files that the metadata are being lost are the ones that are being multipart copied. If a file is multipart copied, metadata is not automatically copied over like it is for a non-multipart copy. The problem is to get all of the information exactly transferred over for a multipart copy, it would require roughly 4 to 5 more calls (such as If that is the case, there is currently one work around which is to avoid using multipart copies. Take a look at this pull request: #1122. Using the config file you can set the threshold upon which you start doing multipart thresholds. So if you set the threshold higher than the maximum size of your file, but less than 5GB, you should be better off. |
Not to be sour or ungrateful but I just sank a few hours into figuring this out, combined with #319. I had issues with the file metadata randomly appearing and not appearing in s3 sync in no discernible manner (initially I thought that s3 had some sort of unintentional "memory" of the metadata of the files being deleted and recreated) and then I finally found #1145. Workaround of This is the kind of known issue that should be in screaming red letters all over the s3 sync documentation. I just blew a few hours on it, which in the grand scheme of things is not that bad, because it could have been that I synced thousands of production data files over thinking it worked fine, and then deleted the originals. Silent data corruption errors are not cool, and I realize fixing something might not be simple on your end, but then in the interim please let your users know. @kyleknap - you were right, it was the multipart thing. |
Another side effect of @makmanalp's work around seems to get around this issue too: aws configure set s3.multipart_threshold 128MB |
@makmanalp +1000, this is a fucking insane, massive bug. It'd be one thing if sync just dropped the content-encoding altogether, but the way things are currently, one might think "huh, I should double-check that |
Not to minimize the insane, massive bug, but this originates as a limitation of S3. aws/aws-sdk-java#367
"Given there is a workaround" is perhaps generous in the case of This limitation appears on the CLI documentation, though it is somewhat buried considering the severity of the issue. |
Yeah, definitely, it seems much saner to default to never using multi-part copies for objects with a content-encoding or other metadata that will be dropped by it. (Or even to always drop the metadata, maybe with a flag to preserve it). |
Good Morning! We're closing this issue here on GitHub, as part of our migration to UserVoice for feature requests involving the AWS CLI. This will let us get the most important features to you, by making it easier to search for and show support for the features you care the most about, without diluting the conversation with bug reports. As a quick UserVoice primer (if not already familiar): after an idea is posted, people can vote on the ideas, and the product team will be responding directly to the most popular suggestions. We’ve imported existing feature requests from GitHub - Search for this issue there! And don't worry, this issue will still exist on GitHub for posterity's sake. As it’s a text-only import of the original post into UserVoice, we’ll still be keeping in mind the comments and discussion that already exist here on the GitHub issue. GitHub will remain the channel for reporting bugs. Once again, this issue can now be found by searching for the title on: https://aws.uservoice.com/forums/598381-aws-command-line-interface -The AWS SDKs & Tools Team This entry can specifically be found on UserVoice at:https://aws.uservoice.com/forums/598381-aws-command-line-interface/suggestions/33168427-does-aws-s3-sync-will-loss-metadata-sometimes |
Based on community feedback, we have decided to return feature requests to GitHub issues. |
Hi, |
😱 |
Scenario 1:
x-amz-meta-json
Scenario 2:
sync files between two buckets which is belong to two account and the same region, for example, one account is for development and the other one is for production, some files' metadata also lost with key like
x-amz-meta-json
.Note: the batch of files which lost metadata are the same in the two scenarios above.
Does anyone has the same issue?
The text was updated successfully, but these errors were encountered: