-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s3 sync repeatedly downloads files without modification #5730
Comments
Hi @shankerwangmiao, I'm not quite following what the scenario is here. The current behavior should be if the file sizes are the same and the last modified time in s3 is greater (newer) than the local file, then a sync is not performed. Can you describe a little more completely, and/or provide debug logs demonstrating the behavior? |
This issue (#599) might be useful. |
Hi, thanks for your information. My using scenario is to build a local mirror of a s3 bucket. Sometimes, especially for things like apt repos, some files are updated on the s3 bucket, but the size is unchanged. As a result, the default behavior is not what I want, and However, this option brings another issue in. As I pointed out in my original issue, when s3 sync stores the last modified timestamp into local filesystem, the resolution is one second. However, on the next time invoking s3 sync, the timestamp is read out from local filesystem and can seldom be the same as stored in s3 attribute, since the resolution of the latter is much higher. So, nearly all the files are downloaded from the s3 bucket once again. I suggest 1. when storing a timestamp into the local filesystem, increase the resolution; 2. when comparing timestamps, consider different resolution of the local filesystem and s3 system. |
I've updated my original issue to reflect the option I'm using, and sorry for missing detailed information. |
Hi @shankerwangmiao, thanks for the update and clarification! Are you using an AWS S3 bucket or a third-party implementation? Specifically this part of your comment referring to the timestamp in the S3 bucket:
This also sounds like this issue #5369 as AWS S3 object timestamps are stored at the second resolution. See this comment: #5369 (comment) |
Hi, thanks for your information. I can confirm that my symptom is exactly the same as that in #5369. However, I have no idea about the exact implementation on the server side. The reply from the server contains timestamps with microsecond resolution:
|
Since aws s3sync is so useful, it would be nice if rounding in the CLI side can be considered. |
Hi @shankerwangmiao, Since the AWS S3 standard is storing time in seconds, any other implementation would need to do the same to ensure compatibility. Changing that at the CLI could cause incompatibility, especially if the AWS standard were to change as well. I'll pass along this feedback to the S3 team to see about making this implementation detail more visible. I appreciate your information! |
|
When s3 sync is invoked to keep local copy of file up to date with a remote s3 bucket, and
--exact-timestamps
is used, files which are not modified in the s3 bucket get repeatedly downloaded.It is caused by the following code snippet:
aws-cli/awscli/customizations/s3/utils.py
Lines 718 to 720 in f074938
When the modified time is stored into local filesystem, the resolution is one second. On next sync, however, the timestamp cannot match that in the remote s3 bucket, the resolution of which is one micro second.
The text was updated successfully, but these errors were encountered: