Welcome to sitar, a simple program to create incremental backups on Amazon S3 using tar(1).
-
Incremental backups - after a full backup, sitar backs up only changed files. That means faster backups, less bandwidth used for saving backups, and less space used by your S3 buckets.
-
Standards compatibility - sitar uses GNU tar(1) for all backups. This allows you to use the tools you are already familiar with to manage your backups. To restore a backup all you need is to download the files from S3, and "untar" them. You do not even need sitar to restore.
-
Stream directly to S3 - you do not need to have disk space for temporary backup files, since the data is streamed directly into S3 files.
-
Level resetting - sitar allows you to reset your next backup levels at any time for maximum control of incremental size/time vs faster restores.
-
GNU tar
-
AWS CLI properly installed and configured
This program is still in beta stage. Please test your backups carefully before deploying to production. Use at your own risk.
Run the following commands to install sitar in your system:
sudo wget -q -O /usr/local/bin/sitar https://raw.githubusercontent.com/flaviovs/sitar/master/sitar.sh
sudo chmod +x /usr/local/bin/sitar
Before using sitar, you must ensure that you have AWS CLI properly installed and configured. You can check if AWS CLI is installed and configured by issuing the following command:
aws s3 ls
You should see a list of all your S3 buckets. Check https://aws.amazon.com/cli/ for more details about installing and configuring AWS CLI.
sitar [-C COMPRESS] DIRECTORY s3://BUCKET/PATH/... [EXTRA-TAR-OPTIONS]
-
COMPRESS - specify the compression method for the backups. Can be one of gzip, bzip2, xz, or none. If not specified, bzip2 will be selected if bzip2(1) is available, otherwise gzip.
-
DIRECTORY - the directory you want to backup.
-
s3://BUCKET/PATH/... - bucket and object path in S3 where backup files should be saved to.
-
EXTRA-TAR-OPTIONS - extra options to be passed to tar(1). IMPORTANT: some tar(1) options can confuse sitar very hard. For example, using --xz when sitar thinks you want a bzip2(1) backup will probably cause trouble. Generally --ignore* and/or options that do not change paths or compression methods are safe.
Example command line:
sitar / s3://my-bucket/backups --exclude-backups --exclude-vcs-ignores
The command above will backup your entire directory hierarchy to the path /backups in the S3 bucket my-bucket. The tar(1) command will receive the parameters --exclude-backups --exclude-vcs-ignores (see tar(1) for more information about tar options).
sitar provides two mechanisms to ignore files:
-
.sitarignore - sitar makes tar(1) read glob patterns from
.sitarignore
files encountered while backing up, and ignore files matching the patterns. For example, the following/.sitarignore
file will ignore some system directories:./mnt ./proc ./run ./sys
-
.sitarskip -
.sitarskip
files make tar(1) completely skip all files and directory at or below directories containing them. For example, to avoid backups of/var/tmp
, you can do:touch /var/tmp/.sitarskip
Note: GNU tar (as of 1.30) will emit a warning for each
.sitarskip
file it encounters while doing backups.
sitar will keep incrementing backup levels indefinitely so that new backups only contain data about files created/updated/deleted since the last operation. This is optimal from a S3 space and bandwidth perspective, but having lots of incremental backups can be problematic:
-
Backup levels keep track of file deletion using tar(1) snapshots in the next level. That means that your deleted files will stay on S3 as long as the current backup level is higher that the one used to back up the file. Also, deleted files need to be "restored" and then removed during a full restore, which might make the operation significantly slower.
For example, suppose that a file
large.zip
is backed up on level 5. On the next day the file is deleted, so it is not included in the next (level 6) daily backup. In this scenario,large.zip
will still be present on your S3 backup, technically forever. Moreover, if you need to do a full restore, the backup file containinglarge.zip
will need to be downloaded and unpacked (i.e. you will need the disk space), even though the file will not be present when you finish the full restore. -
Since you need all incremental backups to restore, the more files you have, the more data you need to download during a restore, and the more files you will need to process to restore.
To allow you to control your backup levels and avoid any potential issues, sitar allow you to reset backup level for subsequent backups.
To do that, just save a file named SITAR-RESET.txt in the backup path on S3. The file should contain a single line containing the number of the backup level you want to reset to.
For example, to reset to level 3, you can use the fallowing AWS CLI command:
echo 3 | aws s3 cp - s3://my-bucket/path/SITAR-RESET.txt
The number specified in the SITAR-RESET.txt tells sitar which level should be considered the current backup level during the next backup. The program will delete all obsolete backups after resetting the level.
Note that sitar reset the current backup level. If you reset to 0 you will effectively delete all your incremental files -- incremental backups will then start over. Also note that the reset mechanism is not meant to reset a full backup. To do that, you just rename/remove the backup path in S3 and re-run sitar.
The program firstly does a full backup using tar(1), and incremental backups after that.
During the first backup, all files are backed up and saved in a file named full.tar.bz2 (or full.tar.gz, if bzip(1) is not available) on the provided S3 path. A tar(1) snapshot is generated and archived (among other information) in a .sitar file, which is also uploaded to the backup path in S3.
Subsequent backups are done by fetching the .sitar file and using previous tar(1) snapshots to configure the next backup level. Incremental backups are saved in files named inc-SEQUENCE-LEVEL.tar.* where:
-
SEQUENCE - is the incremental backup sequence number. Incremental backups should be restored in the right sequence, and the SEQUENCE number can be used to ensure the right ordering.
-
LEVEL - is the level of the incremental backup. Informative only, can be useful in disaster recovery situations, in case someone mess up with backup files.
After an incremental backup is finished, the .sitar file is updated and uploaded to S3.
A README.txt file is also created with basic instruction for restoring the backup.
IMPORTANT: do not delete the .sitar file present on your S3 backup path. Your current backup will not be affected, but the program will refuse to do new backups if you do this.
-
Download the full.tar.bz2 file and all inc-* files. This can be accomplished with the following AWS CLI command:
aws s3 cp --exclude=.sitar --recursive s3://my-bucket/path .
-
Restore the full backup:
tar xf full.tar.bz2 --bzip2 --listed-incremental=/dev/null -C /tmp/restore
-
Restore all incremental backups:
LC_ALL=C ls inc-* | while read file; do \ tar xf "$file" --bzip2 --listed-incremental=/dev/null -C /tmp/restore; done
Notes:
-
The examples assume that you used bzip2(1) for your backups. For gzip(1) or wx(1) (or uncompressed) backups, you must adjust options and file names accordingly.
-
You will need to restore incremental backups manually if you switched compression method after having some incremental backups already done, since your inc-* files will have different extensions/compression methods. The "for loop" above will not work if your incremental backups have mixed compression methods.
-
Do not forget the --listed-incremental=/dev/null option. Your backup will not restore correctly if you omit it.
Use the AWS environment variable to customize the location of the AWS CLI executable:
AWS="$HOME/bin/aws" sitar / s3://my-bucket/path
You can also use the AWSCLI_EXTRA environment variable to pass extra options to the AWS CLI command used to push backup data to S3. For example, to use the Standard IA storage class for your backups by default, you can use the following:
AWSCLI_EXTRA="--storage-class=STANDARD_IA" sitar / s3://my-bucket/path
-
Streaming to S3 will fail if a single backup file is larger than 5GB. To workaround this, use the AWSCLI_EXTRA environment variable (see above) to pass --expected-size=SIZE to AWS CLI, where SIZE is a rough estimation of your backup size (it just need to be a little bigger than the data being uploaded). See https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html for more details.
-
sitar does not do any locking of backup paths in S3. Make sure you are not running two or more backups pointing to the same S3 paths, otherwise bad things will happen to your data.