Skip to content

jpwoodbu/ari-backup

 
 

Repository files navigation

This is seriously forked

This repo is forked from the AmericanResearchInstitute/ari-backup repo, but it should be considered the canonical repo for ari-backup. ARI open-sourced the original codebase long ago and went out of business soon thereafter. There will not likely be any further development on this project under the AmericanResearchInstitute organization but development on ari-backup within this repo is somewhat active. For more ari-backup history, see the last section in this README.

ari-backup

ari-backup is a lightweight generic workflow engine designed specifically for running automated backups. It includes modules with support for running backups using rdiff-backup, rdiff-backup with LVM snapshots, and syncing files to ZFS datasets using rsync. Features include:

  • Centralzed configuration
  • Support for backing up local and remote hosts
  • Configurable job parallelization
  • Ability to run arbitrary commands locally or remotely before and/or after backup jobs (something especially handy for preparing databases pre-backup)
  • Logging to syslog

ari-backup was originally written to automate rdiff-backup jobs. That's been its main focus; but over time it became interesting to add support for other backup types. The architecture of the workflow engine is designed to be extended so that adding new backup types can be done easily.

This application is lightweight thanks mostly to leveraging common system tools to provide most of the facility necessary to run a backup system. cron is used to schedule the backup jobs, xargs is used to optionally run jobs in parallel, run-parts is used to execute individual backup jobs, and ssh is used for authentication and secure data transport.

Audience

This README and the ari-backup documentation expect that the reader has a basic understanding of Linux, file system semantics, how to install a system package, and how to install a Python package. The typical audience for this software is the system administrator that wants to backup several systems with rdiff-backup.

ari-backup was developed on and written for Linux. But there have been reports of its use on Windows using cygwin.

Getting Started

Before you install ari-backup, you should install the following packages from your Linux distribution.

To install the ari_backup package to your system, run this as root:

$ pip install git+https://github.com/jpwoodbu/ari-backup.git

Before you can execute a backup job, there are a few files and directories that need to be set up. At this time, the configuration file for ari-backup is always read from /etc/ari-backup/ari-backup.conf.yaml. For this demo, put this into the ari-backup.conf.yaml file:

backup_store_path: /backup-store

Now create the /backup-store directory.

Our demo will use the most basic example of a backup job. Our backup job will backup our /music directory to /backup-store/my_backup. Put the following into a file named ari-backup-local-demo:

#!/usr/bin/env python3
import ari_backup

backup = ari_backup.RdiffBackup(label='my_backup', source_hostname='localhost')
backup.include('/music')
backup.run()

Make sure you're logged in as a user with permission to read the /etc/ari-backup/ari-backup.conf.yaml file. Make ari-backup-local-demo executable and run it with some debug flags.

$ ./ari-backup-local-demo --debug --dry_run

The output should look something like this:

ari_backup (my_backup) [INFO] workflow.py:392 Running in dry_run mode.
ari_backup (my_backup) [INFO] workflow.py:393 started                          
ari_backup (my_backup) [INFO] workflow.py:254 processing pre-job hooks...      
ari_backup (my_backup) [INFO] workflow.py:396 data backup started...           
ari_backup (my_backup) [DEBUG] rdiff_backup_wrapper.py:152 _run_custom_workflow started
ari_backup (my_backup) [DEBUG] workflow.py:321 run_command ['/usr/bin/rdiff-backup', '--exclude-device-files', '--exclude-fifos', '--exclude-sockets', '--terminal-verbosity', '1', '--include', '/music', '--exclude', '**', '/', '/srv/backup-store/my_backup']
ari_backup (my_backup) [DEBUG] rdiff_backup_wrapper.py:205 _run_backup completed
ari_backup (my_backup) [INFO] workflow.py:398 data backup complete
ari_backup (my_backup) [INFO] workflow.py:283 processing post-job hooks...     
ari_backup (my_backup) [INFO] workflow.py:411 stopped

You'll notice similar output in your syslog as all ari_backup are logged there too. For all the available flags to ari_backup job files use the --help flag.

$ ./ari-backup-local-demo --help

Now let's run the demo for real. Make sure the user you're logged in as also has permission to read the /music directory and has permission to write to the /backup-store/my_backup directory. If all goes well, you should see no output to the console but you can find logging in your syslog.

Your /backup-store directory should now have a my_backup directory. And inside that directory you should see a mirror of your /music/ directory as well as a rdiff-backup-data directory. The rdiff-backup-data is where rdiff-backup stores its own data like the reverse increments, statistics, and file metadata.

Note that paths passed to include() or exclude() can be either directories or files. Those paths are passed as values to the --include and --exclude flags of rdiff-backup respectively. Paths passed to include() or exclude() are passed to rdiff-backup in the order in which they're added, except that all paths passed to exclude() are passed to rdiff-backup before paths to include(). This makes sure everything you wish to exclude is actually excluded before an include rule takes precedence.

Backing up Remote Hosts

For a more exciting demo, let's backup a remote host. We'll be using ssh to authenticate to the remote host and public key authentication is the only method supported by ari-backup. Be sure to have your keys set up for both the user that will run ari-backup and the user that we'll use to connect to the remote host. For this demo, we're going to use the user backups.

The remote system requires very little setup. Once you've got your SSH key installed, the only other step is to install rdiff-backup. ari-backup does not need to be installed on the remote system. Isn't that great!

Make sure that the user that's running your backup script has the remote host's host key in its known_hosts file. The best way to ensure that it is, is to test your public key authentication works by logging in to the remote system manually.

We'll need to add the remote_user setting to our /etc/ari-backup/ari-backup.conf.yaml file. It should now look like:

backup_store_path: /backup-store
remote_user: backups

Let's assume that your remote host is named kif. Make a new backup job file named ari-backup-remote-demo with this content:

#!/usr/bin/env python3
import ari_backup

backup = ari_backup.RdiffBackup(label='kif_backup', source_hostname='kif')
backup.include('/music')
backup.run()

Make ari-backup-remote-demo executable and run it first with the debug flags to see what it will be doing.

$ ./ari-backup-remote-demo --debug --dry_run

If everything looks good, run it without any flags. Again, no output to the console means everthing worked. Check the syslog and your /backup-store/kif_backup directory to see the results. Once you've got your ssh keys set up, the only thing different about remote backups is the value you put in the source_hostname parameter.

Settings and flags

Once you've got a workable backup script, you can use it to see what command line flags are available. Using the ari-backup-local-demo we made before, run this command line:

$ ./ari-backup-local-demo --helpfull

That will display a list of all available flags, a description for each, their default value, and in which module they're defined. See absl-py flags for more on how to use flags.

The default flags values can be overridden by entries in the /etc/ari-backup/ari-backup.conf.yaml config file, on the command line at runtime, or by assigning new flag values to the backup object before run() is called. By convention, flags are assigned as public attributes of backup objects.

If, for example, you wanted to override the value of the remote_user flag defined in the ari_backup.workflow module, you could define remote_user in the /etc/ari-backup/ari-backup.conf.yaml config like so:

remote_user: backup_user

You can also override it on the command line:

$ ./my_backup_script --remote_user backup_user

Finally, you can override it within the backup config file:

#!/usr/bin/env python3
import ari_backup
backup = ari_backup.RdiffBackup(label='mybackup', source_hostname='localhost')
backup.include('/home')
backup.remote_user = 'backup_user'
backup.run()

Using ari-backup with cron

See include/cron/ari-backup for an example script you can use with cron. By default, this script will look for backup jobs in /etc/ari-backup/jobs.d. And by default, this script will only execute one backup job at a time. You can edit the JOBS_DIR and CONCURRENT_JOBS variables in the script to tweak those settings to taste.

To put this altogether with an example, let's use the two backup job scripts you made from before, ari-backup-local-demo and ari-backup-remote-demo. Place them into the /etc/ari-backup/jobs.d directory. Now copy include/cron/ari-backup to /etc/cron.daily (or an equivalent directory on your system). You can now wait for cron to run the script in /etc/cron.daily, or better yet, execute it yourself to test it out.

You can again look at your syslog to see that the backups ran. But you'll also notice that when running our cron script you will actually get some console output as it reports how long the entire selection of jobs took to run. You may see something like this:

real    3m44.318s
user    0m45.595s
sys     0m8.253s

If you have cron set up to email you when there's output like this, then you'll have a handy (or annoying) email reporting whether your backups ran successfully each time.

Be sure that the names of your backup job scripts are compatible with what run-parts expects. See the run-parts man page for more on their filename restrictions.

Pro tip: since run-parts will ignore file names with dots, a simple way to disable a backup job is to prefix a dot to its filename.

Other modules

lvm

Let's add LVM into the mix so that we can achieve crash-consistent backups. This is done using the lvm module. We'll need to add the snapshot_mount_root and snapshot_suffix settings to our existing /etc/ari-backup/ari-backup.conf.yaml file:

snapshot_mount_root: /tmp
snapshot_suffix: -ari-backup

snapshot_mount_root defines where the temporary snapshots are mounted during the backup (snapshots are automatically removed after the backup is completed). snapshot_suffix determines the suffix of the name of the snapshot. This is useful when debugging so that it's clear where the snapshot came from.

Let's assume that your remote host is named db-server. You want rdiff-backup to remove increments older than one month, so you set remove_older_than_timespec='1M'. You specify the LVM volumes and their mountpoints on the remote system (you may add more than one LVM volume by adding multiple backup.add_volume() statements). Finally, specify the directories to be backed up with backup.include(). Make a new backup job file named ari-backup-remote-lvm-demo with this content:

#!/usr/bin/env python3
import ari_backup

backup = ari_backup.RdiffLVMBackup(
    label='mybackup',
    source_hostname='db-server',
    remove_older_than_timespec='1M'
)

backup.add_volume('vg0/root', '/')
backup.include('/etc')
backup.run()

Specifying mount options

When using LVM snapshots, you can provide specific mount options for ari-backup to use when mounting the snapshots. This is done as an optional string argument to the add_volume method. The string should use the same form for mount options you would use in fstab. Expanding on the above example:

backup.add_volume('vg0/root', '/', mount_options='noatime,nodiratime')

Known issue with XFS

Mounting a shapshot of an already mounted XFS file system will likely result in an error. See issue #24. To work around this, you should pass the nouuid mount option.

ZFS

The zfs module provides a way to backup hosts to a machine which uses ZFS for its backup-store. Rather than use rdiff-backup to keep historical datapoints, history is kept in the form of ZFS snapshots. rsync is used to sync files on the source host to the ZFS-based host.

This module was built for a very specific use case which involved first making LVM snapshots on the source host before running the backup. Currently, that is the only use case supported by the zfs module.

An example config using the zfs module:

#!/usr/bin/env python3
import ari_backup

backup = ari_backup.ZFSLVMBackup(
    label='mybackup',
    source_hostname='db-server',
    rsync_dst='zfs-backup-server:/zpool-0/backups/ari-backup/mybackup',
    zfs_hostname='zfs-backup-server',
    dataset_name='zpool-0/backups/ari-backup/mybackup',
    snapshot_expiration_days=60
)

backup.add_volume('vg0/root', '/')
backup.run()

There's a lof of familiar arguments here and a few new ones.

  • rsync_dst: destination argument passed to the rsync command in <hostname>:</path/to/backup/dir> format.
  • zfs_hostname: hostname of the machine storing the backups to ZFS. When using ZFSLVMBackup, the backups are not necessarily stored on the machine running ari-backup.
  • dataset_name: ZFS path to the dataset in <pool>/<path/to/dataset> format.
  • snapshot_expiration_days: the number of days at which a snapshot expires and will be destroyed. This is similar to the RdiffBackup classes's remove_older_than_timespec argument, but in this case the value is simply an integer respresenting a number of days.

Notice that include() was not called. Backing up the entire source file system is implicit. The effect is as if include('/') was was called. This limitation is due to this feature being made specifically to meet the needs of its author. Contributions to enhance this module are strongly encouraged! :)

Running commands before or after a backup

Each workflow object has a run_command method that can be used to run commands locally or remotely before or after the backup is run.

Let's say, for example, you want to dump a database to disk before your backup. We can expand on our previous example using the lvm module.

#!/usr/bin/env python3
import ari_backup

backup = ari_backup.RdiffLVMBackup(
    label='mybackup',
    source_hostname='db-server',
    remove_older_than_timespec='1M'
)

backup.add_volume('vg0/root', '/')
backup.include('/etc')

# Dump database to disk to get a consistent copy.
backup.run_command(
    'mysqldump --all-databases > /var/backups/mysql.sql', host='db-server')

backup.run()

In the above example we're using file redirection to dump the database backup to a particular path. That's a shell feature; but that's OK because the command will be run through a shell on the remote host via SSH. When running commands locally, if you require the command to be run through shell, you must pass the command argument to run_command as a string.

# This snippet will be run locally in a shell and will successfully create a
# /tmp/delme file with 'test' inside.
backup.run_command('echo test > /tmp/delme', host='localhost')

# run_command() will also accept a list for the command argument, but will not
# run the command through a shell explicitly. However, remote commands are run
# through a shell implicitly because SSH is uses to execute the command.  This
# snippet will silently fail to create the /tmp/delme file.
backup.run_command(['echo', 'test', '>', '/tmp/delme'], host='localhost')

In the latter example everything after 'echo' is passed as an argument to echo and there was no shell to recognize the file redirection token and actually make the file.

ari-backup internally always passes a list as the command argument to run_command(). This is done for exlicitness. And run_command() accepting commands as lists could also be useful in some backup configurations. But if what you're running uses shell features, be sure to pass in your command as a string.

If running a command locally, you can either pass 'localhost' as the host argument or leave out the host argument entirely.

Development

If you are interested in contributing to ari-backup, there are two paths to help you get started quickly: Bazel (preferred) and Vagrant.

Bazel

This project includes a .bazelversion file so it's recommend to install Bazel with Bazelisk to ensure a compatible version of Bazel is being used. Once Bazelisk is installed, you can build and test ari-backup with bazelisk test ... from the top of the repo. Bazel will take care of bringing in all the needed dependencies into sandboxes without cluttering up your system.

Vagrant (legacy)

If you are interested in contributing to ari-backup, there is a Vagrantfile.example to help you get started quickly. All you need is a Linux host that supports libvirt, Ansible, and the vagrant-sshfs plugin. Vagrant allows contributors to get quickly up and running, without dirtying their host computer's environment. It automatically configures a guest virtual machine with an ari-backup development setup. Here are example commands for a Fedora host machine (you can season to taste for other distributions by installing their equivalent packages instead):

$ sudo dnf install ansible vagrant-libvirt vagrant-sshfs
$ git clone git@github.com:jpwoodbu/ari-backup.git
$ cd ari-backup
$ cp Vagrantfile.example Vagrantfile
$ vagrant up

Once Vagrant finishes provisioning your new development guest, you can ssh into it by using the vagrant ssh command. There is an sshfs code share from your checked out code on your host into the /vagrant/ folder on the guest. This means you can use your $EDITOR or $IDE of choice on your host, and the guest will instantly see the changes. You should use the guest to run the tests, or to try out your changes. To run the tests, simply execute python /vagrant/setup.py test.

Vagrant guests are meant to be throwaway machines (since the state of your code is still kept on the host) so if you ever get into trouble in the guest, simply run vagrant destroy (on the host) to get rid of it and you can quickly start over.

History and Namesake

ari-backup gets its name from the American Research Institute where it was originally written in bash. As rdiff-backup was our software of choice to backup our Linux systems, we needed some sort of scripting around running rdiff-backup on a schedule. We could write a script that performed all our backups and just place it in /etc/cron.daily, but that didn't seem scalable and was especially monolithic since we were backing up about 50 machines.

We liked the idea of seperate backup scripts for each backup job. In our case, each job was backing up a host. But we didn't want to overcrowd the /etc/cron.daily directory. So we put all our backup scripts in their own directory and put a single file in /etc/cron.daily that called our backups using run-parts. We later cooked in the xargs part that made it easy to run backup jobs concurrently.

When we started to add the LVM snapshot features we decided that porting it to Python was going to make working on this project much easier.

In 2011, ARI graciously open-sourced this software.

About

automate VM backups with lvm snapshots

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 95.0%
  • Starlark 3.0%
  • Shell 1.2%
  • Ruby 0.8%