Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 sync from local source directory that doesn't exist behavior #1564

Closed
prsullivan opened this issue Oct 16, 2015 · 6 comments
Closed

s3 sync from local source directory that doesn't exist behavior #1564

prsullivan opened this issue Oct 16, 2015 · 6 comments
Labels
bug This issue is a bug. s3sync

Comments

@prsullivan
Copy link

Well I think I totally borked myself. I was using a chron job to run s3 sync on a nightly basis to transfer files form a local directory on a mounted secondary hard drive to an s3 bucket. Last week this hard drive crashed and would not mount. I thought that since the local directory didn't exist the s3 sync command would error out. Noooooope. It synced an empty directory, and to complete perfect storm my command included the --delete option, which totally wiped out my backup. Is this intended behavior?

@JordonPhillips JordonPhillips added bug This issue is a bug. s3sync confirmed labels Oct 19, 2015
@JordonPhillips
Copy link
Member

I'm very sorry you had this issue, I know how frustrating it can be to lose your backups. I tested this out myself and can confirm. For posterity, here's an example command:

aws s3 sync --delete ~/does/not/exist s3://test-bucket

I did some brief investigating into other solutions in case something accidentally gets updated. Unfortunately it seems that there is no way to have an 'undelete' on s3 without mirroring. There is versioning available, but previous versions crop up when you delete the new ones, so in your case all your files would eventually get deleted anyway.

Off the top of my head, I could imagine a scenario where you replicate all your writes (possibly using Lambda) to a different s3 bucket whose life cycle policy immediately archives the files to Glacier. Glacier storage is vastly cheaper, so the cost wouldn't be too bad.

Still, the best solution is to fix the bug. We'll definitely be looking into it. In the meantime, could you give me the output of aws --version?

@prsullivan
Copy link
Author

Here is the output:

sullivan@homeserv:~$ aws --version
aws-cli/1.7.32 Python/2.7.6 Linux/3.16.0-50-generic

Thanks

@kyleknap
Copy link
Contributor

So the behavior is intended in terms of how delete works. There were no files identified in the local directory and there were files in the s3 bucket, and those were deleted because they existed in the s3 bucket and not at the local file system.

Erroring out on nonexistant directories and files has caused issues in the past: #856

Also if you are syncing to a non existent folder, it is valid to do something like this:

$ aws s3 sync s3://mybucketfoo1/ nonexistentfolder
warning: Skipping file /Users/kyleknap/Documents/GitHub/aws-cli/foo/nonexistentfolder/. File does not exist.
download: s3://mybucketfoo1/bar.txt to nonexistentfolder/bar.txt

The best we could do to help prevent others from running into a similar situation is to check the path passed in (and make sure that exists) but only when the source is a local filesystem. This may complicate the logic a bit and we also would need to make sure that we do not introduce any backwards incompatibilities.

@prsullivan
Copy link
Author

I imagine it would also be very simple to add some logic to my chron job to check if the directory exists as well. I should probably have been doing something to check the health of the drive before I run a back up command anyway.

@kyleknap
Copy link
Contributor

Yeah I would also recommend looking into versioning to be safe. Here is some docs on it: http://docs.aws.amazon.com/AmazonS3/latest/UG/lifecycle-configuration-bucket-with-versioning.html. I believe you can set up a lifecycle such that versions get deleted after being versioned for a specific amount of time.

@kyleknap
Copy link
Contributor

Merged PR: #1575 that fixes issue. Closing out issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. s3sync
Projects
None yet
Development

No branches or pull requests

3 participants