Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 sync from s3 to local disk SCRAMBLED MY SYSTEM FILES! #1174

Closed
sbimikesmullin opened this issue Feb 23, 2015 · 3 comments
Closed

aws s3 sync from s3 to local disk SCRAMBLED MY SYSTEM FILES! #1174

sbimikesmullin opened this issue Feb 23, 2015 · 3 comments

Comments

@sbimikesmullin
Copy link

my command was:

cd /data; mkdir aj-dynamo-backups/; aws s3 sync s3://aj-dynamo-backups/ aj-dynamo-backups/

NOTE: I ran it several times, because I noticed a few files were appeared to be downloaded each time--its like it skips some files during the download and you have to run it until it stops saying it downloaded new files to be sure you got them all. I waited for it to complete between each run. I it was inside a 64-bit virtualbox vm with Ubuntu 14.04, in case that matters.

for the next few days afterward, i would see random errors like these ones when starting gvim:

Fontconfig error: "/etc/fonts/conf.d/30-urw-aliases.conf", line 1: not well-formed (invalid token)
Fontconfig error: "/etc/fonts/conf.d/40-nonlatin.conf", line 1: syntax error
Fontconfig error: "/etc/fonts/conf.d/45-latin.conf", line 1: not well-formed (invalid token)
Fontconfig error: "/etc/fonts/conf.d/49-sansserif.conf", line 1: not well-formed (invalid token)
Fontconfig error: "/etc/fonts/conf.d/50-user.conf", line 1: syntax error
Fontconfig error: "/etc/fonts/conf.d/51-local.conf", line 1: syntax error
Fontconfig error: "/etc/fonts/conf.d/57-dejavu-sans-mono.conf", line 1: syntax error
Fontconfig error: "/etc/fonts/conf.d/57-dejavu-sans.conf", line 1: syntax error
Fontconfig error: "/etc/fonts/conf.d/57-dejavu-serif.conf", line 1: not well-formed (invalid token)
Fontconfig error: "/etc/fonts/conf.d/58-dejavu-lgc-sans-mono.conf", line 1: not well-formed (invalid token)

and these when i tried to solve the problem with apt-get update && apt-get dist-upgrade:

http://i.imgur.com/pcXTz5M.png

upon viewing one of the files in less, such as /etc/modprobe.d/alsa-base.conf, i see:

http://i.imgur.com/tRyd3y2.png

...clearly a portion of my dynamodb dump in the tell-tale quazi-json format the DynamoDB exporter writes to S3.

I could not finish my dist-upgrade due to this issue. I am mind-boggled. Some questions are:

  1. Why would it overwrite a file in /etc/ ?
  2. Why would it only write a portion of the data to that file, and not the entire volume? (each file in s3 is written as a ~256MB volume)
  3. Why would it keep the same filename, when there were no matching paths or filenames in the s3 bucket which I downloaded from? i.e., /etc/modprobe.d/alsa-base.conf
  4. Is the file size basically the same number of bytes that it used to be when it was in the original/normal plain-text format? Did something ultra-low-level happen here to overwrite just the portion of the filesystem that represented this file? here's what the file looks like after corruption:
$ ll /etc/modprobe.d/alsa-base.conf
-rw-r--r-- 1 root root 2507 Feb 13  2013 /etc/modprobe.d/alsa-base.conf
  1. How did it bypass filesystem permissions to overwrite these /etc/ files? I am 100% positive I ran the command as a non-root user developer. The user does have passwordless sudoers access. That is the nearest thing I can think of. But I did not use sudo run the the aws cli command, as you can see in my example above. Here you can see the directory permissions were set to root and user-writable only for /etc/
$ ll /data
total 12
drwxr-xr-x  3 developer developer 4096 Feb 18 10:10 ./
drwxr-xr-x 25 root      root      4096 Feb 23 13:45 ../
drwxrwxr-x  4 developer developer 4096 Feb 18 12:10 aj-dynamo-backups/
$ ll /etc
total 1304
drwxr-xr-x 146 root root       12288 Feb 23 13:48 ./
...

I haven't counted how many files are affected. I saved a snapshot of the vm though for forensic analysis. However it seems like potentially hundreds of files under /etc/ are affected.

I am afraid to use aws s3 sync in general now. Seems like whatever happened was catastrophic level bug. That or severe VM corruption. But I find that difficult to believe given the way that it corrupted each file. It seemed somewhat deterministic, since the filesystem was still readable, and filenames and paths were still in-tact, and their contents were readable by less, and still legible/recognizable as DynamoDB dump output.

On the plus side, there is 48GB of data downloaded to my /data/aj-dynamo-backups/ dir. It appears to be a complete copy that I intended to receive, but I don't know how I would compare it byte-for-byte without some s3 recursive directory checksum, or something.

$ aws --version
aws-cli/1.3.22 Python/2.7.6 Linux/3.13.0-30-generic

Just a warning to other users. I am solving the issue by rolling back my VM to an earlier snapshot. I am doubting whether I can rely on this to play a role in the long-term DynamoDB backup/mirror solution I had intended. Any help appreciated.

@sbimikesmullin
Copy link
Author

i rolled back to jan 21 (before aws s3 sync and confirm things work again). I am now able to complete a dist-upgrade and am also updating my installed aws-cli version. will update vbox and installed guest additions versions, as well.

FYI my Oracle VirtualBox version is 4.3.12 r93733 and I run it on Windows 8.1 64-bit host. I have 4 physical cores and I had given the machine 100% of them.

the more i think about this, its bizarre that any Java process could write to a root-only file. it seems like the kernel would have stopped it from writing to a root-only file, if it had gone through the kernel.

also, it was a lot of data to write to the .vdi (+48gb)

i suppose the only explanation that does make sense is VirtualBox writing outside of bounds to the .vdi file on my Windows host. im not sure how the ext filesystem works; if that could possibly result in the type of file corruption I described above.

another thing i did notice when I was running aws s3 sync is that it seemed computationally expensive; it seemed to have all my cores busy as if it was downloading multiple files in parallel using different threads/processes. i saw that in tmux inside the vm, my cpu inside the vm was maxing. and that would have been 100% of my physical cores.

and there appear to be similar, unresolved issues documented in older versions of vbox:

https://www.virtualbox.org/ticket/10031
https://bugzilla.kernel.org/show_bug.cgi?id=16165

so its probably safe to run outside of vms, or on vms with more resources. most likely a very specific evil version combination i have--either Windows or VirtualBox level.

will leave this issue in case anyone else reports something similar in the future.

@kyleknap
Copy link
Contributor

Sorry to hear that happened to you. We have never seen anything like that happen before nor does it really make any sense, from an AWS CLI developer's point of view, that it happened.

Besides the fact that it does not make sense that the CLI wrote to a directory that was not even part of the targeted directory, the CLI only uses the permissions of the user that it is being ran by. Given the version of the CLI that you were using, the CLI would actually error out if the user did not have access to the file. In the current version of the CLI, version 1.7.11, we actual check if we have access and skip the file if we do not have access instead of completely failing.

It sounds like you updated to the latest CLI, which is good. As to trying to cut down on the computational cost of the s3 sync, take a look at this PR: #1122. We are working on adding documentation for it, but the PR provides a decent description. You can configure the CLI to use less threads for s3 operations. By default, I believe it uses 10 threads so you could drop that down to something lower.

Let me know if you run into this issue again or have any other questions.

@kyleknap
Copy link
Contributor

@sbimikesmullin

I am closing this issue as there seems that nobody else has ran into this issue. If you or anyone else runs into this issue please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants