-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3 sync not syncing files locally with that have the same size but have been modified #1074
Comments
The main reason we have that logic is so that the sync can be round tripped. So for example given the CLI's current logic, lets begin with some file
Then when we check the bucket, the file's last modified time will be newer than the local file:
So if we try to resync it appropriately does not re-upload the file:
Now if we try to sync via downloading, it will also not download the file:
If for downloads we use the same logic as for uploads, the sync will always download the file because the last modified time of the S3 file (which is the source for this sync) will be newer than the local file (which is the destination for this sync), even though neither file has changed. If we were able to explicitly set the last modified time of the S3 object, we could get the your proposed logic to work. Given the round trip reasoning, we are going to keep the default sync strategy logic as is. If you do not mind me asking, what was the use case for the sync strategy logic that you were proposing? We could possibly add it as an additional feature/parameter. |
I thought there must be a reason as it seemed too obvious, and I understand now why its related to #599. Our use case is we sync some large directories from a master EC2 server to S3. Then there is a cloud of EC2 servers that regularly sync down from S3. With the current strategy, this basically doesn't work, as files that have changed but are the same size don't sync down to the cloud. |
Thanks! This will have to be a feature request. We will look into/consider adding such a sync strategy. |
@kyleknap , we found using the --exact-timestamps solves our problem. Not sure if its meant to be the solution to this is or not, but it seems to work perfectly. Thanks. |
That's right. I forgot about that option. Good to hear that it worked out for you. Closing issue. |
I have Bazman's same use case - we sync up to S3, and then sync from S3 down to a different machine. I don't think Exact Timestamps solves it. In reading the documentation I'm concerned about the AWS sync logic - are you saying that the default behavior when syncing down from S3 is that an older S3 version will replace the newer local version? I think I see what your'e getting at but by and large it seems confusing to have a sync logic in which an older file replaces a newer file. Maybe we could look at how rsync handles this kind of situation? Thank you |
Thinking about this more I see how it could be useful to have s3>local overwrite newer files on the local end. For example if a user changes something locally and you want the s3 content to refresh/overwrite the user's content. However, I still hope that there can be an option to do the above as well as an option to local>s3 and s3>local behave the same way (wherein a file goes from source to destination only if it's newer in the source, or doesn't exist, etc.) Thank you! |
This is a really nasty bug and lead to an error in production that took me days to track down. Basic tools like this should just work without users having to know implementation details. Why doesn't sync use something sane like a checksum instead of a timestamp or file size? |
It is 2018 and this is still a bug. In short, if you modified a file locally then ran
The file that was modified after the bucket file will then be overwritten and the modified date on the local file will now match the older S3 bucket file's modified date. This is contrary to the documentation for the sync function provided by AWS: https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html |
Just to make the use case practical we had a diff in a properties file of
This change led to the same size property file, so the sync from the s3 bucket to the local machine failed to get the change. Obviously very different behavior. |
I'm being effected by this due to different timezones between the uploader and downloader. |
It seems that the explanation given in the first answer might be obsolete: when I sync a file from S3 to local, the local file then has the same last-modified time as shown on S3. I need a strategy opposite to the default one: when syncing from S3 to local, if S3 is newer, update the local file. |
@kyleknap would you mind reopening this issue or point me to a more recent duplicate of this issue, if any? |
We are still experiencing this problem, a local file Any update from the code maintainers? |
I have the same problem. Need to sync data from local machine to S3. Request the ticket to be reopened. |
I have the same problem. |
I ran into the same problem, and using "--exact-timestamps" option worked. As per documentation, the default behavior is to ignore same-sized items unless the local version is newer than the S3 version.
|
I originally posted this as a reply to #406 but I think its worth posting as a new issue as it seems like its a relatively straight forward but significant bug.
When doing an s3 sync from s3 to local, newer files (based on modified time) on S3 won't sync to local if the file size is the same.
It seems like its just because the compare_time() function is wrong.
In aws-cli/awscli/customizations/s3/syncstrategy/base.py, lines 207 - 223 (as of 9f56e8f):
There shouldn't be any logic difference between the cmd == "upload" or cmd == "copy" or cmd == "download". Its comparing source file and destination file times, and so all it needs to know is if the source file is newer than the destination or not.
i.e. Change the above block of code just to:
Seems to fix the problem.
This same code was also in an issue in earlier versions where the sync strategies weren't split out, and the comparison code was in comparator.py
The text was updated successfully, but these errors were encountered: