Skip to content

Windows 32/64 binary, FreeBSD statically linked

Compare
Choose a tag to compare
@fcorbelli fcorbelli released this 30 Dec 19:18
· 266 commits to main since this release
af8bc17

First (public) release with zfsbackup-zfsreceive-zfsrestore


NOTICE. There are virtually no test on input parameters. So place caution. If you need help just ask. In the next release I will include stringent checks


And now... THE SPIEGONE!

This release contains numerous features, both commonly used and specific to zfs

The first function is versum, something similar to a "smarter" hashdeep

Basically: verifies the hashes of the files in the filesystem, against a list in a text file
We want to verify that the backup-restore with zfs works well, without trusting
It can be fed with two types of files: those created by zpaqfranz itself, and those of hashdeep.

The former are written by the sum function with the appropriate switches (ez. zpaqfranz sum *.txt -xxh3). BTW zpaqfranz can write and read (-hashdeep switch) this fileformat.

In the following examples we will operate on the tank/d dataset with SSD/NVMe, working on fc (yep, francocorbelli snapshot)

You can use all hash types of zpaqfranz, in this example it will be xxhash64
Incidentally -forcezfs is used to have the example folder (which contains .zfs, being a snapshot) examined, otherwise zpaqfranz will ignore it

zpaqfranz sum /tank/d/.zfs/snapshot/fc -forcezfs -ssd -xxhash -noeta -silent -out /tmp/hash_xx64.txt

A possible alternative, to have third-party control (i.e. software other than zpaqfranz) is to use hashdeep
usually in the md5deep package
The essential difference of hashdeep from md5deep is the use of multithreading: it reads files from disk in parallel, so suppose we are operating with solid-state disks (or ... wait longer :) )

Various hashes can be selected, but as they are basically used as checksums and not as cryptographic signatures: md5 is more than fine (it is the fastest), at least for me

hashdeep -c md5 -r /tank/d/.zfs/snapshot/fc >/tmp/hashdeep.txt

BTW hashdeep does not have a find replace function, awk or sed is commonly used. Uncomfortable to say the least

To check the contents of the filesystem we have three chances

  1. the zpaqfranz hashlist
    In this example, a multithreaded operation (-ssd) will be adopted, operating a renaming (-find/-replace) to convert the paths in the source file from the target ones
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
  1. the hashdeep

zpaqfranz is able to 'understand' the original format of hashdeep (look at the -hashdeep switch)

zpaqfranz versum z:\uno\_tmp\hashdeep.txt -hashdeep -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
  1. small-scale test, without reading from filesystem
    If the hash function used to create the .zpaq file is the same as that of the .txt control file, you can operate it as follows
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -to thebak.zpaq  -find /tank/d/.zfs/snapshot/fc -replace /tank/d

It should be remembered that the default hash of zpaqfranz is xxhash64, so if you want to use other hashes (e.g. xxh3, sha256 or sha3 etc.) you must, when creating the .zpaq file (the a command), add the relevant switch (e.g. -xxh3, -sha3, -blake3 etc.)

Recap

Complete example of creating an archive (on FreeBSD with zfs) to be then extracted on Windows with independent control

The source will be tank/d using -ssd for multithread

Take the snapshot fc of tank/d

zfs snapshot tank/d@fc

Get the hash list with xxhash64 into the file /tmp/hash_xx64.txt

zpaqfranz sum /tank/d/.zfs/snapshot/fc -forcezfs -ssd -xxhash -noeta -silent -out /tmp/hash_xx64.txt

Create hashdeep.txt w/md5 into /tmp/hashdeep.txt. Using md5 because very fast
WARNING: /sbin/zfs set snapdir=hidden tank/d should be required to "hide" .zfs folders to hashdeep. There is not an easy way to exclude folders in hashdeep

hashdeep -c md5 -r /tank/d/.zfs/snapshot/fc >/tmp/hashdeep.txt

Now making the backup (fixing path w/-to)
In this case the default hash function is used (xxhash), matching with hash_xx64.txt
We "inject" the two hash list, /tmp/hash_xx64.txt and /tmp/hashdeep.txt, to keep with the archive

zpaqfranz a /tmp/thebak.zpaq /tank/d/.zfs/snapshot/fc /tmp/hash_xx64.txt /tmp/hashdeep.txt -to /tank/d

Destroy the snapshot

zfs destroy tank/d@fc

Now transfer somehow thebak.zpaq to Win (usually with rsync)

Extracting everything to z:\uno (look at -longpath)

zpaqfranz x thebak.zpaq -to z:\uno -longpath

Verify files by zpaqfranz's hash list
Note the -find and -replace to fix source (on FreeBSD) and destination (on Windows) paths

zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d

Now paranoid double-check with hashdeep.
Please note the -hashdeep

zpaqfranz versum z:\uno\_tmp\hashdeep.txt -hashdeep -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d

Finally compare the hashes into the txt with the .zpaq

zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -to thebak.zpaq -find /tank/d/.zfs/snapshot/fc -replace /tank/d

Short version: this is an example of how to perform on a completely different system (Windows) the verification of a copy made from a .zfs snapshot with zpaqfranz. We will see how, in reality, it is designed for "real" zfs backup-restore

New advanced option: the -stdout

If the files are ordered-stored into the .zpaq, it is possible to -stdout

WHAT?

Files stored within .zpaq are divided into fragments (let's say 'chunks') which, in general, are not sorted.
This happens for various reasons (I will not elaborate), preventing the possibility of extracting files in stream form, i.e. as a sequence of bytes, as required by -stdout

This is not normally a serious problem (for zpaq 7.15) as it simply does not support mixing streamed and journaled files into an archive

Translation of the translation (!)

zpaq started out as a stream compressor (actually no, there would be a further very long explanation here that I will spare)
Processes any long sequence of bytes, one byte at a time, and writes a sequence of bytes in output: this is the so-called streamed format

It was present in older versions of zpaq, something analogous to gz just to give a known example.

Subsequently, the developer of zpaq (Matt Mahoney) implemented the so-called 'journaled' storage format, where each file has its various versions in it.
This is the 'normal' format, while the 'streamed' one has practically disappeared (vestiges remain in the source).

For a whole series of technical problems that I won't go into here, Mahoney decided not to allow the mixing of the two types:

  • archives WITH VERSIONS (aka: modern)

XOR

  • with streamed files (aka: OK for stdout)

The ability to write to stdout does not have much reason to exist, unless coupled with the ability to read from stdin, and zpaq 7.15 does not allow this, essentially operating by reading files from the filesystem "the usual way".

As you may have noticed (?) for some time now, I have instead evolved zpaqfranz to allow the processing of input streams (with -stdin)

The concrete reason is twofold

The first is to archive mysql dumps, the tool of which (mysqldump and various similar ones) output precisely a text file.
This way, you can use zpaqfranz to archive them versioned (which as far as I know is superior to practically any other system, by a WIDE margin).

The second is to make sector-level copies of Windows drives, in particular the C disk:

As you may have noticed (?) zpaqfranz is now able to back up (from within Windows) an entire system, either 'file-based' (with a VSS) or 'dd-style'

Obviously the 'dd' method will take up more space (it is good to use the f command to fill the free space with zeros) and will also be slower

BUT
it allows you to mount (with other software) / extract (e.g. with 7z) almost everything

If you are really paranoid (like me), what could be better than a backup of every sector of the disk?

Let us now return to why it is so important (and actually not trivial) to obtain archives in journaled format but with the possibility of orderly (=streamed) extraction

It is about speed

Streamed .zpaq archives exist, BUT listing (i.e. enumerating the list of files therein) is very slow, requiring a scan of the entire file (which can be hundreds of GB = minutes)

They are also extremely slow to be created (by the zpaqd 7.15 utility), essentially monothread (~10MB/s)

Instead, by having journaled (i.e. 'normal' zpaq format) but ORDERED archives, I can obtain all the benefits

Various versions of the same file, listing speed, and even creation speed (maintaining multithreading), at the cost of a (not excessive) slowdown in output, due to the use of -stdout instead of the more efficient filesystem

Why all this?

For zfs backups, of course, and especially restorations

There are now three new commands (actually very crude, to be developed, but that's the "main thing")

  1. zfsbackup
  2. zfsrestore
  3. zfsreceive

One normally uses .zpaq to make differential zfs backups, i.e. with one base file and N differential files, which are stored as different versions. This is good, it works well, and it is not fragile (differential means that two files are enough to restore). The "normal" method for "older" zpaqfranz.

BUT

it takes up space: as the differential snapshots get bigger and bigger, it is a normal problem for any differential system

On the other hand, using incremental zfs snapshots has always been very risky and fragile, because it only takes small failures to make the recovery fail.

The zfsbackup command masks the complexity (of backing up a single dataset) to be extremely space efficient and with minimal time
How?
Let's see an example

Just 3 commands to make another level of verify...

touch /tank/d/momentoincrementale
/sbin/zfs set snapdir=hidden tank/d
/usr/local/bin/hashdeep -c md5 -r /tank/d |/usr/local/bin/zpaqfranz a /tank/d/hasho.zpaq elenco.txt -stdin

and now the "real" one (warning: NO input checks!)

/usr/local/bin/zpaqfranz zfsbackup /temporaneo/incrementale.zpaq tank/d

That's all

Into the /temporaneo/incrementale.zpaq archive the dataset tank/d will be "snapshotted" and archived

Now look at a l (list) with the new -stdout

H:\backup\rambo_incrementale>zpaqfranz l incrementale.zpaq -stdout
incrementale.zpaq:
23 versions, 23 files, 157.051 fragments, 4.968 blocks, 66.872.360.363 bytes (62.28 GB)
AVAILABLE -stdout             23


- 2022-12-17 15:58:42      71.790.160.936  0644 [STDOUT] 00000001.zfs
- 2022-12-17 16:22:59              49.744  0644 [STDOUT] 00000002.zfs
- 2022-12-17 16:30:19              45.016  0644 [STDOUT] 00000003.zfs
- (...)
- 2022-12-22 09:00:00       4.579.140.520  0644 [STDOUT] 00000015.zfs
- 2022-12-24 20:01:41         193.543.496  0644 [STDOUT] 00000016.zfs
(...)
       81.605.788.800 (76.00 GB) of 81.605.788.800 (76.00 GB) in 23 files shown
       66.872.360.363 compressed

The files are compatible with STDOUT (because created by this new zpaqfranz with -stdin)

In generalthis is not possible: in this example the first two files can be streamed to stdout, the latter 2... maybe

(...)
- 2019-11-13 13:54:03              21.326  0666 [STDOUT] /tank/d/ALESSANDRA/00086140 Verbale spl.docx
- 2019-12-13 15:26:50             934.440  0666 [STDOUT] /tank/d/ALESSANDRA/00086216 COMPUTO LAVORI.doc
- 2020-01-02 09:51:57          22.114.816  0666 /tank/d/ALESSANDRA/00086216 perizia.doc
- 2019-12-13 15:27:02             943.598  0666 /tank/d/ALESSANDRA/00086216 verb.pdf
(...)

Recap: at every zpaqfranz zfsbackup a new snapshot is made, like in this example

root@aserver:~ # zpaqfranz zfslist "*" "zpaqfranz"
zpaqfranz v56.4h-JIT-L archiver,  (26 Dec 2022)
tank/d@zpaqfranz00000001
tank/d@zpaqfranz00000002
tank/d@zpaqfranz00000003
(...)
tank/d@zpaqfranz00000023

0.048 seconds (00:00:00) (all OK)

(in fact, using -kill will destroy the snapshots, leaving only the last, to save space)

Now the problem moves to data recovery

There are basically two ways

The first is to extract all .zfs files from the zpaq archive, and then restore them in sequence
"Old" style (aka: zpaq 7.15)

  1. Extract the .zfs into some folder, like
zpaqfranz x incrementale.zpaq -to estratti
  1. Then zfsrestore
    In this example a 1.sh script will be prepared, to restore data from the ./estratti folder into the
    (new dataset) rpool/saveme [of course rpool must exists, and saveme NOT!]
zpaqfranz zfsrestore ./estratti rpool/saveme -out 1.sh

The script is something like that (automagically use pv if found)

cat ./estratti/00000001.zfs |/usr/bin/pv| zfs receive rpool/saveme@franco00000001
cat ./estratti/00000002.zfs |/usr/bin/pv| zfs receive rpool/saveme@franco00000002
(...)
cat ./estratti/00000023.zfs |/usr/bin/pv| zfs receive rpool/saveme@franco00000023

So far, so good
BUT
takes up double the space.

If the data originally occupied 100GB, let's say 100GB will be needed (in which to extract the .zfs files)
AND
another 100GB in which to restore them (actually more, it's just an example)

The "new style" (to be tested!) is zfsreceive

zpaqfranz zfsreceive incrementale.zpaq tank/ripri1 -out doreceive.sh

You will get a script to be run

./doreceive.sh

With something like...

/usr/local/bin/zpaqfranz x incrementale.zpaq 00000001.zfs -stdout |/usr/bin/pv| zfs receive tank/ripri1@franco00000001
/usr/local/bin/zpaqfranz x incrementale.zpaq 00000002.zfs -stdout |/usr/bin/pv| zfs receive tank/ripri1@franco00000002
(...)
/usr/local/bin/zpaqfranz x incrementale.zpaq 00000023.zfs -stdout |/usr/bin/pv| zfs receive tank/ripri1@franco00000023

In this case the single files will be extracted and streamed by zpaqfranz ("piped") into zfs receive, halving the space needed to restore (and even time)

versum - paranoid test for zfs' backups

And (turning back to the example) if you are really, really, really paranoid, you can test
the hashdeep's file against the restored zfs, just to be really sure

cd /tank/ripri1
zpaqfranz x hasho.zpaq

zpaqfranz versum elenco.txt -hashdeep -ssd -find /tank/d -replace /tank/ripri1
zpaqfranz v56.4h-JIT-L archiver,  (26 Dec 2022)
franz:-ssd -hashdeep
franz:find       <</tank/d>>
franz:replace    <</tank/ripri1>>
60188: Loading text files
60134: Working on hashdeep  format <<elenco.txt>>
60451: hasher selected: md5
60188: Loading finished

Creating 4 hashing thread(s) [2]

----------------------------------------------------------------------------
Files               100.090 vs exp               100.090  this is good
Bytes        69.533.786.054 vs exp        69.529.738.801  *** DIFFER 4.047.253 ***
----------------------------------------------------------------------------
60402: ERROR md5 |file=771CD0E8408C16FA596146A6E31ED4BC|txt=228A410DD737F57A623D41979898AD18| /tank/ripri1/hasho.zpaq
----------------------------------------------------------------------------
Time           87.436 s, average speed 795.253.511/s (758.41 MB/s)

This is good: everything is fine EXCEPT the hasho.zpaq
And it must be different!

Using the -hashdeep switch (during zfsbackup), if hashdeep is reachable, a checklist will be "magically" added to the archive

Short (!) version

By synergistically using the zfs and zpaqfranz replication tools, "everything" (with attributes, etc.) can be obtained with substantially minimal space occupation AND very short times (the file system scan performed by any program is always slow, while the zfs calculation takes a few hundred milliseconds)
If you do not trust even in zfs (yep, I do NOT trust) you can use hashdeep (of course solid-state drives highly raccomended) to be REALLY sure that everything runs fine after the recovery

Recovery (especially for large amounts of data) takes time and space
That's why, generally, it's a strategy that I ADD to the classic zpaq one

In this release I put a binary version (statically linked) for FreeBSD, to make it easier to test on systems such as TrueNAS

Download zpaqfranz