Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 sync based on MD5 diff #7011

Closed
1 of 2 tasks
muliyul opened this issue Jun 3, 2022 · 2 comments
Closed
1 of 2 tasks

S3 sync based on MD5 diff #7011

muliyul opened this issue Jun 3, 2022 · 2 comments
Labels
duplicate This issue is a duplicate. feature-request A feature should be added or improved. s3

Comments

@muliyul
Copy link

muliyul commented Jun 3, 2022

Describe the feature

When syncing a local copy to a bucket I would like to sync (upload) only files that changed. This is supported today, but does not work for some use cases.

Use Case

We hold a number of CSVs in a repository synced to S3 by Github workflow. Using the --size-only flag works as long as the file size has changed, but if a typo is made and a fix is pushed (say hacc is the typo and the fix pushed is hack) then this change is not picked up by the CLI and that file is not being synced.

Proposed Solution

Introduce a flag (--md5 for example) that calculate MD5 for all files and compare each of them to the remote copy's MD5. Only those with different hashes should be synced.

Other Information

This may not be very efficient for large files and should be probably mentioned (and discouraged) in the documentation.

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CLI version used

2.3.2

Environment details (OS name and version, etc.)

Ubuntu latest (Github workflow default runner)

@muliyul muliyul added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jun 3, 2022
@kdaily kdaily added s3 and removed needs-triage This issue or PR still needs to be triaged. labels Jun 3, 2022
@kdaily
Copy link
Member

kdaily commented Jun 3, 2022

Hi @muliyul,

Thanks for your comment. This feature has been requested quite some time ago, but as you note there are performance issues - not only for large files, but if you have many files. See the issue here: #599

There is a possibility to support the new S3 checksum feature in AWS CLI high level S3 commands:

#6750

You can read about that feature announcement here:

https://aws.amazon.com/blogs/aws/new-additional-checksum-algorithms-for-amazon-s3/

I'm going to close this as a duplicate. If #6750 would solve your use case, please 👍🏻 the initial comment.

Thanks!

@kdaily kdaily closed this as completed Jun 3, 2022
@kdaily kdaily added the duplicate This issue is a duplicate. label Jun 3, 2022
@github-actions
Copy link

github-actions bot commented Jun 3, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue is a duplicate. feature-request A feature should be added or improved. s3
Projects
None yet
Development

No branches or pull requests

2 participants