Skip to content
DatuX edited this page Sep 29, 2023 · 68 revisions

Look at the README.md for the introduction.

Getting started

Installation

zfs-autobackup creates ZFS snapshots on a "source" machine and then replicates those snapshots to a "target" machine via SSH.

zfs-autobackup may be installed on either the source machine or the target machine. (Installing on both is unnecessary.)

When installed on the source, zfs-autobackup will push snapshots to the target. When installed on the target, zfs-autobackup will pull snapshots from the source.

Using pip

The recommended installation method on most machines is to use pip:

[root@server ~]# pip install --upgrade zfs-autobackup

The above command can also be used to upgrade zfs-autobackup to the newest stable version.

To install the latest beta version add the --pre option.

Using easy_install

On older machines you might have to use easy_install:

[root@server ~]# easy_install zfs-autobackup

Using the sources

If you dont want to install zfs-autobackup, or want to make some changes to the code, look at Development

Example

In this example, a machine called backup is going to create and pull backup snapshots from a machine called pve01.

Setup SSH login

As zfs-autobackup will perform numerous remote commands via ssh, we strongly recommend setting up passwordless login via ssh. This means generating an ssh key on target machine (backup) and copying the public ssh key to the source machine (pve01).

NOTE: Most examples use root-access on both the source and target. If you want to use a normal user its a bit more complex: Your user needs read/write access to /dev/zfs and you need to setup zfs permissions as well.

Generate an SSH key on backup

Create an SSH key on the backup machine that runs zfs-autobackup. You only need to do this once.

Use the ssh-keygen command and leave the passphrase empty:

root@backup:~# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
...
root@backup:~#

Copy the SSH key to pve01

Now you need to copy the public part of the key to pve01

The ssh-copy-id command is a handy tool to automate this. It will just ask for your password.

root@backup:~# ssh-copy-id root@pve01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
Password:

Number of key(s) added: 1

root@backup:~#

This allows the backup machine to login to pve01 as root without password.

Select filesystems to backup

Next, we specify the filesystems we want to snapshot and replicate by assigning a unique group name to those filesystems.

Its important to choose a unique group name and to use the name consistently. (Advanced tip: If you have multiple sets of filesystems that you wish to backup differently, you may do this by creating multiple group names.)

In this example, we assign the group name offsite1 to the filesystems we want to backup.

On the source machine, we set the autobackup:offsite1 zfs property to true, as follows:

[root@pve01 ~]# zfs set autobackup:offsite1=true rpool
[root@pve01 ~]# zfs get -t filesystem,volume autobackup:offsite1
NAME                      PROPERTY             VALUE                SOURCE
rpool                     autobackup:offsite1  true                 local
rpool/ROOT                autobackup:offsite1  true                 inherited from rpool
rpool/ROOT/pve-1          autobackup:offsite1  true                 inherited from rpool
rpool/data                autobackup:offsite1  true                 inherited from rpool
rpool/data/vm-100-disk-0  autobackup:offsite1  true                 inherited from rpool
rpool/data/vm-101-disk-0  autobackup:offsite1  true                 inherited from rpool
rpool/tmp                 autobackup:offsite1  true                 inherited from rpool

ZFS properties are inherited by child datasets. Since we've set the property on the highest dataset, we're essentially backing up the whole pool.

If we don't want to backup everything, we can exclude certain filesystem by setting the property to false:

[root@pve01 ~]# zfs set autobackup:offsite1=false rpool/tmp
[root@pve01 ~]# zfs get -t filesystem,volume autobackup:offsite1
NAME                      PROPERTY             VALUE                SOURCE
rpool                     autobackup:offsite1  true                 local
rpool/ROOT                autobackup:offsite1  true                 inherited from rpool
rpool/ROOT/pve-1          autobackup:offsite1  true                 inherited from rpool
rpool/data                autobackup:offsite1  true                 inherited from rpool
rpool/data/vm-100-disk-0  autobackup:offsite1  true                 inherited from rpool
rpool/data/vm-101-disk-0  autobackup:offsite1  true                 inherited from rpool
rpool/tmp                 autobackup:offsite1  false                local

The autobackup property can have these values:

  • true: Backup the dataset and all its children.
  • false: Don't backup the dataset and all its children. (Exclude the dataset)
  • child: Only backup the children of the dataset, not the dataset itself.
  • parent: Only backup the dataset, but not the children. (supported in version 3.2 or higher)

(Note: Only use the zfs command to set these properties. Do not use the zpool command.)

Running zfs-autobackup

Run the script on the backup machine and pull the data from the source machine specified by --ssh-source.

[root@backup ~]# zfs-autobackup -v --clear-mountpoint --ssh-source pve01 offsite1 data/backup/pve01
  zfs-autobackup v3.1.1 - (c)2021 E.H.Eefting (edwin@datux.nl)
  
  Selecting dataset property : autobackup:offsite1
  Snapshot format            : offsite1-%Y%m%d%H%M%S
  Hold name                  : zfs_autobackup:offsite1
  
  #### Source settings
  [Source] Datasets on: pve01
  [Source] Keep the last 10 snapshots.
  [Source] Keep every 1 day, delete after 1 week.
  [Source] Keep every 1 week, delete after 1 month.
  [Source] Keep every 1 month, delete after 1 year.
  
  #### Selecting
  [Source] rpool: Selected
  [Source] rpool/ROOT: Selected
  [Source] rpool/ROOT/pve-1: Selected
  [Source] rpool/data: Selected
  [Source] rpool/data/vm-100-disk-0: Selected
  [Source] rpool/data/vm-101-disk-0: Selected
  [Source] rpool/tmp: Excluded
  
  #### Snapshotting
  [Source] Creating snapshots offsite1-20220107131107 in pool rpool
  
  #### Target settings
  [Target] Datasets are local
  [Target] Keep the last 10 snapshots.
  [Target] Keep every 1 day, delete after 1 week.
  [Target] Keep every 1 week, delete after 1 month.
  [Target] Keep every 1 month, delete after 1 year.
  [Target] Receive datasets under: data/backup/pve01
  
  #### Synchronising
  [Target] data/backup/pve01/rpool@offsite1-20220107131107: receiving full
  [Target] data/backup/pve01/rpool/ROOT@offsite1-20220107131107: receiving full
  [Target] data/backup/pve01/rpool/ROOT/pve-1@offsite1-20220107131107: receiving full
  [Target] data/backup/pve01/rpool/data@offsite1-20220107131107: receiving full
  [Target] data/backup/pve01/rpool/data/vm-100-disk-0@offsite1-20220107131107: receiving full
  [Target] data/backup/pve01/rpool/data/vm-101-disk-0@offsite1-20220107131107: receiving full
  
  #### All operations completed successfully

The results

As you might notice, zfs-autobackup preserve the whole parent-path of the source.

So rpool/data/vm100-disk-0 ends up as: data/backup/pve01/rpool/data/vm-100-disk-0

Since its a backup, its usefull to preserve the original structure of the data like this.

Stripping the path

Since you might think this is ugly, there is the --strip-path option. However this can lead to collisions if you 2 source datasets result in the same target paths. Since version 3.1.2 zfs-autobackup will check for this and emit an error.

Making source and target paths look the same

If you want your source and target structure to look exactly the same, you have to do the following:

  • Select the whole source-pool. In this case: zfs set autobackup:offsite1=true rpool
  • Use --strip-path=1
  • Specify target-pool as target-path. In this case: data
  • Use the --force option the first time to overwrite the existing target pool. (New in v3.1.2)

Pull or push?

Note that this is called a "pull" backup. The backup (target) machine pulls the backup from the source machine. This is usually the preferred way.

It is also possible to let a source machine push its backup to the target machine. There are security implications to both approaches, as follows:

  • With a pull backup, the target machine will have ssh access to the source machine.
  • With a push backup, the source machine will have ssh access to the target machine.

If you wish to do a push backup, then you would setup the SSH keys the other way around and use the --ssh-target parameter on the source machine.

Note that you can always change the ssh source and target parameters at a later point without any problems.

Pull+push (zero trust)

It also possible to use a 3rd server that pulls backups from the source and pushes the data to the target server via 1 stream. This way the source and target server wont have to be able to reach each other. If one server gets hacked, they cant access the other server.

To do this, you only have to install zfs-autobackup on a 3rd server and use both --ssh-source and --ssh-target to specify the other source and target servers.

Local Usage

It is also possible to run zfs-autobackup locally, where you could backup snapshots to a different pool on the same server. This is done by simply omitting the --ssh-source and --ssh-target parameters.

For example, let's say you have an additional pool for local backups called backups, that's on separate device(s) from your data pools. In this pool, you have a dataset called autobackup. You could run the following command (assuming you set the zfs group name to autobackup:local on your data filesystems):

zfs-autobackup -v local backups/autobackup

Combining this with a remote push or pull backup, you could then set the zfs group name on your backup filesystems to something like autobackup:remote, then have a second zfs-autobackup job that backs up these snapshots to your remote storage like:

zfs-autobackup -v --ssh-target root@backupserver remote data/backup/pve01

Automatic backups

Now every time you run the command, zfs-autobackup will create a new snapshot and replicate your data.

Older snapshots will eventually be deleted, depending on the --keep-source and --keep-target settings. The defaults are shown above under the 'Settings summary'. Look at Thinner for more info.

Once you've got the correct settings for your situation, you can just store the command in a cronjob.

Or just create a script and run it manually when you need it.

Monitoring

Don't forget to monitor the results of your backups, look at Monitoring for more info.

Splitting up snapshot and backup job

You might want to make snapshots during the week, and only transfer data during the weekends.

In this case you would run this each weekday:

zfs-autobackup -v --ssh-source pve01 offsite1 data/backup/pve01 --no-send

And this on weekend days:

zfs-autobackup -v --ssh-source pve01 offsite1 data/backup/pve01

You can also create the snapshots in offline mode by using zfs-autobackup as a snapshot tool on the source side. This way the snapshots will always be created, even if the backup server is offline or unreachable.

Use as snapshot tool

You can use zfs-autobackup as a standalone snapshot tool.

To do this, simply omit the target-path, as follows:

zfs-autobackup -v --ssh-source pve01 offsite1

Only use this if you don't want to make any backup at all, or if a target isn't reachable during the snapshotting phase.

If you have offline backups, checkout Common-snapshots-and-holds

Specifying ssh port or options

The correct way to do this is by creating ~/.ssh/config:

Host smartos04
    Hostname 1.2.3.4
    Port 1234
    user root

This way you can just specify "smartos04" as host.

Look in man ssh_config for many more options.

Multiple backups of the same data

You can use multiple zfs-autobackup jobs to transfer data to multiple targets. Just make sure that you use different backup names. This way the jobs should not interfere with each other: Each job only removes its own snapshots.

Using the same backup name

You CAN use the same backup name to transfer data to multiple targets. However in that case it's up to you to make sure that a common snapshot of one backup job isn't deleted by the other job.

One way to do this is to make adjust the --keep-source option or to make sure the backups run at a close enough interval.

However: To prevent confusion, and to be more flexible, I would advise to always use different and clear to distinguish names. e.g.: autobackup:offsite and autobackup:local for example.

Tips

  • Use --clear-mountpoint to prevent all kinds of problems. See Mounting
  • Use --debug if something goes wrong and you want to see the commands that are executed. This will also stop at the first error.
  • Use these only one time if needed: --force --destroy-incompatible --rollback. Dont add them to your script. Try to solve the underlying cause if you keep needing them.
  • Set the readonly property of the target filesystem to on. This prevents changes on the target side. (Due to the nature of ZFS itself, if any changes are made to a dataset on the target machine, then the next backup to that target machine will probably fail. Such a failure can probably be resolved by perfroming a target-side zfs rollback of the affected dataset.) Note that readonly prevents changes to the CONTENTS of the dataset directly. Its still possible to receive new datasets and manipulate properties etc.
  • Use --clear-refreservation to save space on your backup machine.
  • zfs-autobackup uses holds by default, so you might get "dataset busy" if you try to destroy a snapshot. (check zfs holds --help)

Restore example

Restoring can be done with simple zfs commands. For example:

root@fs1:/home/psy#  zfs send fs1/zones/backup/zfsbackups/server01/vm01@offset1-20220110230003 | ssh root@2.2.2.2 "zfs recv rpool/restore"

More information

Usage

usage: ZfsAutobackup.py [--help] [--test] [--verbose] [--debug] [--debug-output] [--progress] [--utc] [--version] [--ssh-config CONFIG-FILE] [--ssh-source USER@HOST] [--ssh-target USER@HOST] [--property-format FORMAT] [--snapshot-format FORMAT] [--hold-format FORMAT] [--strip-path N] [--exclude-unchanged BYTES] [--exclude-received] [--no-snapshot] [--pre-snapshot-cmd COMMAND]
                        [--post-snapshot-cmd COMMAND] [--min-change BYTES] [--allow-empty] [--other-snapshots] [--set-snapshot-properties PROPERTY=VALUE,...] [--no-send] [--no-holds] [--clear-refreservation] [--clear-mountpoint] [--filter-properties PROPERTY,...] [--set-properties PROPERTY=VALUE,...] [--rollback] [--force] [--destroy-incompatible] [--ignore-transfer-errors]
                        [--decrypt] [--encrypt] [--zfs-compressed] [--compress [TYPE]] [--rate DATARATE] [--buffer SIZE] [--send-pipe COMMAND] [--recv-pipe COMMAND] [--no-thinning] [--keep-source SCHEDULE] [--keep-target SCHEDULE] [--destroy-missing SCHEDULE]
                        [BACKUP-NAME] [TARGET-PATH]

ZfsAutobackup.py v3.2 - (c)2022 E.H.Eefting (edwin@datux.nl)

positional arguments:
  BACKUP-NAME           Name of the backup to select
  TARGET-PATH           Target ZFS filesystem (optional)

Common options:
  --help, -h            show help
  --test, --dry-run, -n
                        Dry run, dont change anything, just show what would be done (still does all read-only operations)
  --verbose, -v         verbose output
  --debug, -d           Show zfs commands that are executed, stops after an exception.
  --debug-output        Show zfs commands and their output/exit codes. (noisy)
  --progress            show zfs progress output. Enabled automaticly on ttys. (use --no-progress to disable)
  --utc                 Use UTC instead of local time when dealing with timestamps for both formatting and parsing. To snapshot in an ISO 8601 compliant time format you may for example specify --snapshot-format "{}-%Y-%m-%dT%H:%M:%SZ". Changing this parameter after-the-fact (existing snapshots) will cause their timestamps to be interpreted as a different time than before.
  --version             Show version.

SSH options:
  --ssh-config CONFIG-FILE
                        Custom ssh client config
  --ssh-source USER@HOST
                        Source host to pull backup from.
  --ssh-target USER@HOST
                        Target host to push backup to.

String formatting options:
  --property-format FORMAT
                        Dataset selection string format. Default: autobackup:{}
  --snapshot-format FORMAT
                        ZFS Snapshot string format. Default: {}-%Y%m%d%H%M%S
  --hold-format FORMAT  ZFS hold string format. Default: zfs_autobackup:{}
  --strip-path N        Number of directories to strip from target path.

Selection options:
  --exclude-unchanged BYTES
                        Exclude datasets that have less than BYTES data changed since any last snapshot. (Use with proxmox HA replication)
  --exclude-received    Exclude datasets that have the origin of their autobackup: property as "received". This can avoid recursive replication between two backup partners.

Snapshot options:
  --no-snapshot         Don't create new snapshots (useful for finishing uncompleted backups, or cleanups)
  --pre-snapshot-cmd COMMAND
                        Run COMMAND before snapshotting (can be used multiple times.
  --post-snapshot-cmd COMMAND
                        Run COMMAND after snapshotting (can be used multiple times.
  --min-change BYTES    Only create snapshot if enough bytes are changed. (default 1)
  --allow-empty         If nothing has changed, still create empty snapshots. (Same as --min-change=0)
  --other-snapshots     Send over other snapshots as well, not just the ones created by this tool.
  --set-snapshot-properties PROPERTY=VALUE,...
                        List of properties to set on the snapshot.

Transfer options:
  --no-send             Don't transfer snapshots (useful for cleanups, or if you want a separate send-cronjob)
  --no-holds            Don't hold snapshots. (Faster. Allows you to destroy common snapshot.)
  --clear-refreservation
                        Filter "refreservation" property. (recommended, saves space. same as --filter-properties refreservation)
  --clear-mountpoint    Set property canmount=noauto for new datasets. (recommended, prevents mount conflicts. same as --set-properties canmount=noauto)
  --filter-properties PROPERTY,...
                        List of properties to "filter" when receiving filesystems. (you can still restore them with zfs inherit -S)
  --set-properties PROPERTY=VALUE,...
                        List of propererties to override when receiving filesystems. (you can still restore them with zfs inherit -S)
  --rollback            Rollback changes to the latest target snapshot before starting. (normally you can prevent changes by setting the readonly property on the target_path to on)
  --force, -F           Use zfs -F option to force overwrite/rollback. (Useful with --strip-path=1, but use with care)
  --destroy-incompatible
                        Destroy incompatible snapshots on target. Use with care! (implies --rollback)
  --ignore-transfer-errors
                        Ignore transfer errors (still checks if received filesystem exists. useful for acltype errors)
  --decrypt             Decrypt data before sending it over.
  --encrypt             Encrypt data after receiving it.
  --zfs-compressed      Transfer blocks that already have zfs-compression as-is.

Data transfer options:
  --compress [TYPE]     Use compression during transfer, defaults to zstd-fast if TYPE is not specified. (gzip, pigz-fast, pigz-slow, zstd-fast, zstd-slow, zstd-adapt, xz, lzo, lz4)
  --rate DATARATE       Limit data transfer rate in Bytes/sec (e.g. 128K. requires mbuffer.)
  --buffer SIZE         Add zfs send and recv buffers to smooth out IO bursts. (e.g. 128M. requires mbuffer)
  --send-pipe COMMAND   pipe zfs send output through COMMAND (can be used multiple times)
  --recv-pipe COMMAND   pipe zfs recv input through COMMAND (can be used multiple times)

Thinner options:
  --no-thinning         Do not destroy any snapshots.
  --keep-source SCHEDULE
                        Thinning schedule for old source snapshots. Default: 10,1d1w,1w1m,1m1y
  --keep-target SCHEDULE
                        Thinning schedule for old target snapshots. Default: 10,1d1w,1w1m,1m1y
  --destroy-missing SCHEDULE
                        Destroy datasets on target that are missing on the source. Specify the time since the last snapshot, e.g: --destroy-missing 30d

Full manual at: https://github.com/psy0rz/zfs_autobackup