Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 sync --exact-timestamps flag ignored for uploads #4460

Open
danil-smirnov opened this issue Sep 1, 2019 · 13 comments
Open

s3 sync --exact-timestamps flag ignored for uploads #4460

danil-smirnov opened this issue Sep 1, 2019 · 13 comments
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3sync s3syncstrategy s3

Comments

@danil-smirnov
Copy link

There is error-prone situation described in AWS S3 Sync Issues when same-sized files are failed to be updated.

To help users to avoid issues, --exact-timestamps flag has been added to s3 sync command with this PR.

However, the implementation is enabled for download operation only, i.e. when files are being copied from s3 to local.

Hence is case of uploading files from local to s3 this flag does not work (ignored). I think this is rather counterintuitive and should be changed to make the flag respected for all the commands.

@kyleknap
Copy link
Contributor

kyleknap commented Sep 4, 2019

Could you ellaborate on the exact scenario in which you would want to use --exact-timestamps for uploads? I believe making it only applicable to downloads was done on purpose because we we do not have any control of the timestamp of the object stored in S3 (the timestamp when uploaded is used). So if we you uploaded a file to s3 with s3 sync and ran s3 sync again using --exact-timestamps, the CLI will reupload the file to S3 because the timestamp in S3 will be different than the local file.

@kyleknap kyleknap added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 4, 2019
@danil-smirnov
Copy link
Author

Hi @kyleknap!
--exact-timestamps flag sounds to me like "overwrite even if same sizes" and it seems working this way - effectively.

I assume the flag is quite important for avoiding cases like listed in this issue.

Sample scenario: we have processes of writing files to local and syncing them to s3, separated in time (e.g. some async CI/CD pipeline). So the first process writes a file to local folder1/, then the second process writes the same-sized file but with different content to folder2/. The async process sync folder1/ to S3 folder/. After that another async process tries to sync folder2/ to S3 folder/. In this scenario the second sync won't overwrite the file as folder2 is older than folder/.

In this case using --exact-timestamps flag would resolve the issue.

Also, it seems really counterintuitive to me that --exact-timestamps quietly ignored when copy from local to s3 without any warning. If this flag is firmly considered for downloads only, I would like to see a warning in case it used for upload, to avoid users confusing.

Thank you for reading

@no-response no-response bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Sep 5, 2019
@lqueryvg
Copy link

lqueryvg commented Dec 16, 2019

I completely agree with @danil-smirnov that the AWS cli should not quietly ignore --exact-timestamps when the target is an S3 bucket.

My preference would be for it to fail with a usage error.
Second best would be to at least print a warning.

There are probably lots of people (like me) using this flag, thinking it's helping to solve their sync-to-s3 issues, when in fact it's only giving them a false sense of security.

@ryansmithevans
Copy link

ryansmithevans commented Jul 22, 2020

This is one of the most frustrating uses of sync i've come across, with --exact-timestamp only applying to downloads to local.

I have a version file which I CAN'T update with sync because the original file and the new file are the same size, but have different contents.

I'd like to rollback to a previous version for QA purposes and i'm unable to do this, because the newer file takes priority if the sizes are the same.

The effect of this is that trying to move between version 1.0.4 to 1.0.3 in my case can't be done as the file which is used to determine the version is the same size, has different content but wont be replaced because the previous version is 'older'.

@xpaulz
Copy link

xpaulz commented Aug 7, 2020

my use case is that I want to sync from my artifacts in my dev account to my local disk, make a couple changes for tracking versions history etc, then upload the local disk to the prod account. but when the prod account has a file that's newer than what I want to push from my dev account, my aws s3 sync won't cut it

@kdaily kdaily added feature-request A feature should be added or improved. s3 s3sync s3syncstrategy labels Aug 17, 2020
@aberkvam
Copy link

This is amazing. There's no way to run "aws sync" to s3 in a way that guarantees that the local files match the remote ones? That makes this command kind of useless. Is there any better way to handle this other than deleting all the files and re-uploading?

da70 added a commit to NYULibraries/dlts-finding-aids-dev-web that referenced this issue Sep 20, 2021
…3 sync \-\-exact\-timestamps flag ignored for uploads \#4460](aws/aws-cli#4460)
da70 added a commit to NYULibraries/dlts-open-square-search-application that referenced this issue Sep 20, 2021
…ies of Open Square site assets and book cover thumbnails [+]

For details, see:

* [s3 sync \-\-exact\-timestamps flag ignored for uploads \#4460](aws/aws-cli#4460)
* [NYUP-594: Open Square search: get static site assets and cover thumbnails working with Vue webpack\-dev\-server](https://jira.nyu.edu/browse/NYUP-594)
* comment in [NYUP-547: Create Open Square search application](https://jira.nyu.edu/browse/NYUP-547?focusedCommentId=114461&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-114461)
da70 added a commit to NYULibraries/dlts-readium-js-viewer that referenced this issue Sep 20, 2021
…- see [s3 sync \-\-exact\-timestamps flag ignored for uploads \#4460](aws/aws-cli#4460)
da70 added a commit to NYULibraries/dlts-enm-search-application that referenced this issue Sep 20, 2021
… \-\-exact\-timestamps flag ignored for uploads \#4460](aws/aws-cli#4460)
@michaelwilliams-thermofisher

I have a situation where a target s3 bucket is being synced from 2 different sources. If "source a" uploads a newer existing file to the target, and "source b" has an older file but with a different size, aws sync will overwrite with the older file. How do i avoid this behavior? I want to ONLY sync using newer modified dates.

@tim-finnigan tim-finnigan added the p2 This is a standard priority issue label Nov 10, 2022
@gtownend
Copy link

Agree that --exact-timestamps would be useful in s3 -> s3 or local -> s3 scenarios and I'm not sure I understand what prevents it being included?

The initial response indicates that it would create unnecessary uploads in the case of running multiple syncs with the same source and destination. I understand that limitation but the functionality would still be useful for other use cases. Even with that limitation it's an improvement on the current approaches you'd have to use which all involve running at least 2 commands and uploading everything regardless of whether the timestamps match.

If there's a concern around altering the behaviour of this then why not instead put in a different flag? It seems strange that there's no way to run sync that ensures the content of the destination is what's in the source.

In terms of use case mine is using one bucket to hold versioned builds of a static website and another bucket to host the website as pointed out above currently rollback is impossible with sync and even updating to a later version can break depending on when the versions are first put in S3 and when they are first moved to the hosting bucket.

As a side note for anyone who comes across this issue I think first copying with --recursive and then syncing with --delete to cleanup is a better workaround than deleting first.

@ThomasMarcelo
Copy link

ThomasMarcelo commented Jan 13, 2023

How are you guys circumventing this? Do you first remove all files from S3 and then sync?
I'm getting issues whenever my CI/CD pipeline runs a revert operation.

@aberkvam
Copy link

We switched to using https://rclone.org/ for sync operations. It behaves as expected.

@WonderPanda
Copy link

I'm also looking for some way to guarantee that the files I'm syncing from local will be guaranteed to all be copied to s3 regardless of file sizes or timestamps. I have the folder contents that I'd like to sync and would really like to just have a simple way to push them up

@WonderPanda
Copy link

Agree that --exact-timestamps would be useful in s3 -> s3 or local -> s3 scenarios and I'm not sure I understand what prevents it being included?

The initial response indicates that it would create unnecessary uploads in the case of running multiple syncs with the same source and destination. I understand that limitation but the functionality would still be useful for other use cases. Even with that limitation it's an improvement on the current approaches you'd have to use which all involve running at least 2 commands and uploading everything regardless of whether the timestamps match.

If there's a concern around altering the behaviour of this then why not instead put in a different flag? It seems strange that there's no way to run sync that ensures the content of the destination is what's in the source.

In terms of use case mine is using one bucket to hold versioned builds of a static website and another bucket to host the website as pointed out above currently rollback is impossible with sync and even updating to a later version can break depending on when the versions are first put in S3 and when they are first moved to the hosting bucket.

As a side note for anyone who comes across this issue I think first copying with --recursive and then syncing with --delete to cleanup is a better workaround than deleting first.

I have the exact same setup as you. Any chance you could elaborate on your statement about how doing a copy first helps out this situation?

@gtownend
Copy link

gtownend commented Aug 8, 2023

I have the exact same setup as you. Any chance you could elaborate on your statement about how doing a copy first helps out this situation?

@WonderPanda - If I remember correctly the issue is that existing files are not always replaced when using sync. My workaround is this:

aws s3 cp --recursive s3://$SOURCE_BUCKET s3://$DESTINATION_BUCKET &&
aws s3 sync --delete s3://$SOURCE_BUCKET s3://$DESTINATION_BUCKET

The copy command ensures every file in source is now in destination.
The sync command then removes any files in destination that aren't in source.

FWIW we've been using that since my original comment without issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3sync s3syncstrategy s3
Projects
None yet
Development

No branches or pull requests