Releases: fcorbelli/zpaqfranz
Windows 32/64 binary,HW accelerated, Linux, FreeBSD
This is a brand new branch, full of bugs, ehm "features" :)
HW accelerated SHA1/SHA2
Up to version 57 the hardware acceleration was only available for the Windows version (zpaqfranzhw.exe)
From version 58 (obviously still to be tested) it also becomes activatable on different systems (newer Linux-BSD-based AMD/Intel), via the compilation switch -DHWSHA2
zpaqfranz (should) then autodetect the availability of those CPU extensions, nothing is needed by the user
It is possible to enforce with the -hw
To see more "things" use b -debug
TRANSLATION
If you compile with -DHWSHA2 you will get something like that
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)
In this example this is a INTEL (JIT) executable, with (kind of) GUI (on Windows), with HW BLAKE3 acceleration, SHA1/2 HW acceleration, win SFX64 bit module (build 55.1)
So far, so good
Then run
zpaqfranz b -debug
If you are lucky you will get something like
(...)
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1/2,SFX64 v55.1,(2023-03-21)
FULL exename <<C:/zpaqfranz/release/58_1/zpaqfranz.exe>>
42993: The chosen algo 3 SHA-1
1838: new ecx 2130194955
1843: new ebx 563910569
SSSE3 :OK
SSE41 :OK
SHA :OK
DETECTED SHA1/2 HW INSTRUCTIONS
(...)
zpaqfranz will "automagically" runs HW acceleration, because your CPU does have SSSE3, SSE4.1, and SHA extension
Of course if you get a "NO"... bye bye
This kind of CPUs should be AMD Zen family (Ryzen, Threadripper, etc), Intel mobile 10th+, Intel desktop 11th+ generation
BTW the old zpaqfranzhw.exe (Win64) is
zpaqfranz v58.1e-JIT-GUI-L,HW BLAKE3,SHA1,SFX64 v55.1,(2023-03-21)
Beware: this is SHA1 acceleration, NOT SHA1/2. Therefore you will need to enter the -hw switch manually (to enable)
RECAP
- With -DHWSHA2 enabled, zpaqfranz will detect and use the HW acceleration, if it thinks your CPU supports it
- If, for some reason, you want to force its use, even on CPUs that do not officially have these extensions, use the switch -hw; usually you will get a segmentation fault or something like that (depending on the operating system), not my fault
- If you want to know if zpaqfranz "thinks" that your CPU is enabled, use zpaqfranz b -debug and look at the output
- Will you get a huge improvement in compression times? No, not really. You will have the biggest difference if you use SHA256 hashing functions, which benefit so much from the acceleration. SHA1 much less (the software version is already very fast)
- Is -DHWSHA2 faster than -DHWSHA1 ? In fact, no. SHA1 is "just a tiny bit" faster. Why? Too long to explain.
- Why does my even relatively modern Intel CPU not seem to support it? Who knows, the short version: not my fault. Even relatively recent CPUs have not been equipped by the manufacturer Intel
- Does it work on SPARC-ARM-PowerPC-whatever-strange-thing? Of course NO
- Is it production-safe? Of course NOT. As the very first release some nasty things can happend
Luke, remember. The more feedback, the more bug-fixing. Luke, report bugs, use the Force...
And don't forget the github star and sourceforce review! (I am becoming like a youtuber who invites people to subscribe to channels LOL)
Other news
Some refactoring, to became more "Mac-friendly" (here the risk of introducing bugs is considerable, sorry, I will correct them as I go along)
Using MD5 instead of XXH3 in checktxt (supporting Hetzner storagebox, there is still work to be done)
Some "GUI" improvement (In perspective, I am preparing the possibility of selecting some files to extract, but it still needs development)
No more dd embedded (smaller source size)
Windows 32/64 binary, 64 bit-HW accelerated
Changed help
Rationalisation of help
zpaqfranz
zpaqfranz h
zpaqfranz h h
zpaqfranz h full
Multioperation (with wildcards)
In commands t and x (test, extract)
zpaqfranz t *.zpaq ...
zpaqfranz x pippo*.zpaq...
Initial (kind of) text based GUI (Windows)
The new gui command open a (rudimentaly) ncurses-based GUI for listing, sorting, selecting and extracting files
Yes, I know, the vim-style syntax is not exactly user friendly, there will be future improvements
Under Windows, compiling with the -DGUI switch, you can do something like
zpaqfranz gui 1.zpaq
The vim-like commands are
f F / => find substring
Cursor arrow up-down left-right => page up, page down, line up, line down
-
- => move line + -
: => goto line
m M => set minsize Maxsize
d D => set datefrom Dateto
q Q ESC => exit
F1 sort name, F2 sort size, F3 sort date, F4 sort ext, F5 sort hash
F6 show size, F7 show date, F8 show hash, F9 show stdout
t => change -to
s => searchfrom
r => replace to
x => extract visible rows
- => move line + -
In this example we want to extract all the .cpp files as .bak from the 1.zpaq archive. This is something you typically cannot do with other archives such as tar, 7z, rar etc.
With a "sort of" WYSIWYG 'composer'
First f key (find) and entering .cpp
Then s (search) every .cpp substring
Then r (replace) with .bak
Then t (to) for the z:\example folder
Finally x to run the extraction
gui.mp4
In the medium term, in addition to bug fixes, box filters etc., there will be a PAKKA-style sorted list, or time machine style, with versions of individual files
Windows 32/64 binary,HW accelerated, Linux, FreeBSD
New command: 1on1
Deduplicate a folder against another one, by filenames and checksum, or only checksum
Julius Erving and Larry Bird Go One on One
A deduplication function at file level is to identify files inside folders that have been 'manually' duplicated e.g. by copy-paste
I did not find portable and especially fast programmes: they often use very... stupid approaches (NxM comparisons), with quite high slowdowns.
By using the -ssd switch it is possible to activate the multithread which allows, in the real world, performance above GB/s
To make things clear the file into "-deleteinto" will be (in case) deleted
Dry run (no -kill), =hash,=filename,multithread
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -ssd
Real run (because -kill), 0-files too
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -zero -kill
Real run, with XXH3, with everything (even file with .zfs). This will delete file with DIFFERENT name, BUT same content
zpaqfranz 1on1 c:\dropbox -deleteinto z:\pippero2 -xxh3 -kill -forcezfs
Updated zfs-something commands
zfsadd
Now support almost every zpaqfranz switch, getting the timestamp from snapshot, not snapshot name
Suppose you have something like that
tank/pippo@franco00000001
tank/pippo@franco00000002
tank/pippo@franco00000003
(...)
tank/pippo@franco00001025
You want to purge those snapshots, but retaining the data, getting everything inside consolidate.zpaq
zpaqfranz zfsadd /tmp/consolidated.zpaq "tank/pippo" "franco" -force
You can get only a folder, read the help!
Then you can purge with
zpaqfranz zfspurge "tank/pippo" "franco" -script launchme.sh
This method is certainly slow, because it requires an exorbitant amount of processing. However, the result is to obtain a single archive that keeps the data in a highly compressed format, which can eventually be extracted at the level of a single version-snapshot
In short, long-term archiving for anti-ransomware policy
Improved zfsreceive
This VERY long term archiving of zfs snapshots is now tested for 1000+ snapshots on 300GB+ datasets, should be fine
Example: "unpack" all zfs snapshots (made by zpaqfranz zfsbackup command) from ordinato.zpaq into
the new dataset rpool/restored
zpaqfranz zfsreceive /tmp/ordinato.zpaq rpool/restored -script myscript.sh
Then run the myscript.sh
Windows 32/64 binary,HW accelerated, ESXi, Linux, Free/Open BSD
Initial support for proxmox backup/restore, on zfs
Proxmox is a debian-based virtualiser that I like, it has a similar style to what I would have done myself.
It has a particular backup mechanism (with an external product, proxmox backup) that is very interesting (it looks like a free Nakivo, for those involved in virtualisation).
The 'internal' backups are done by vzdump, also good, BUT which operates with 'normal' compression (zstd for example), without deduplication AND WITHOUT ENCRYPTION (this is bad, very bad)
proxomox also supports zfs storage, but in a way I do not like much, namely zvol
For those not used to zfs, these are 'volumes' written in blocks, so they are not accessible as files (yep, sometime not 'everything' is a file)
The aim is to have more performance by removing the 'intermediate' layer of virtual-disks-on-files.
However, this makes backups a real nightmare, as there is no 'easy' way to make them (there are no .vmdk or .raw to copy back and forth)
It is possible to force this behaviour (i.e. save to file instead of zvol). I will not go into details.
In this hypothesis (i.e. that the virtual machines are 'real' files, normally present in the /var/lib/vz folder, itself on a zfs storage), I have implemented two new functions for zpaqfranz to make zfs-based-snapshotted proxmox-backups
To reiterate: proxmox supports a thousand different types of storage, I chose the one I like, maybe in the future I will make zpaqfranz more 'smart'. For now I basically use it for my proxmox server backups, together with proxmox backup (better two technologies than one)
proxmox, although opensource, is commercially supported by a German company and, therefore, they are not very keen, understandably, on alternative tools to those offered by them.
(they deleted a thread on their forum without any explanation :)
I tried to contact the developer of proxmox backup systems by e-mail, with no response.
So I share - with anyone who is interested - my little experience
zfsproxbackup
Archiving proxmox backups (from zfs local storage), getting VM disks from /var/lib/vz
- -force Destroy temporary snapshot (before backup)
- -kill Remove snapshot (after backup)
- -all Get all VM
- -not Do not backup (exclude) VMs
- -snapshot kj Make 'snapname' to kj (default: francoproxmox)
Backup/encrypt w/key 'pippo' VM 200 zfsproxbackup /bak/200.zpaq 200 -force -kill -key p
ippo
Backup 2 VMs: 200 and 300 zfsproxbackup /bak/200_300.zpaq 200 300
Backup ALL VMs zfsproxbackup /bak/all.zpaq -all -force -kill
Backup all EXCEPT 200 and 300 zfsproxbackup /bak/part.zpaq -all -not 200 -not 300
-force -kill
This is a "real world" example of taking a zfs-based FreeBSD mailserver
Please note the size taken
/usr/local/bin/zpaqfranz zfsproxbackup /backup/200_posta_G6.zpaq 200 -force -kill -key pippo
zpaqfranz v57.3f-JIT-L, (14 Feb 2023)
franz:-key (hidden)
franz:-force -kill
zfsproxmox-backup VERY EXPERIMENTAL!
Works only on /var/lib/vz VM disk(s)
and on /etc/pve/qemu-server/ config(s)
37720: running Searching vz from zfs list...
38072: Founded pool <<zp0/zd0>>
53549: Archive /backup/200_posta_G6.zpaq
53550: Pool zp0/zd0
53550: Purged Pool zp0_zd0
53552: Mark francoproxmox
38135: VM Path 000 /var/lib/vz/.zfs/snapshot/francoproxmox/images/200
37720: running Destroy snapshot (if any)
38162: x_one zfs destroy zp0/zd0@francoproxmox
37720: running Taking snapshot
38162: x_one zfs snapshot zp0/zd0@francoproxmox
/backup/200_posta_G6.zpaq:
15 versions, 18 files, 693.419 frags, 3.215 blks, 7.764.776.789 bytes (7.23 GB)
Updating /backup/200_posta_G6.zpaq at offset 7.764.776.789 + 0
Adding 42.954.981.615 (40.00 GB) in 2 files (1 dirs), 8 threads @ 2023-02-14 15:45:30
(001%) 1.00% 00:08:55 ( 409.63 MB)->( 0.00 B) of ( 40.00 GB) 81.93 MB/se
(002%) 2.00% 00:07:07 ( 819.29 MB)->( 0.00 B) of ( 40.00 GB) 102.41 MB/se
(...)
(099%) 99.00% 00:00:03 ( 39.60 GB)->( 52.12 MB) of ( 40.00 GB) 107.57 MB/se
(100%) 100.00% 00:00:00 ( 40.00 GB)->( 52.12 MB) of ( 40.00 GB) 107.52 MB/se
1 +added, 0 -removed.
7.764.776.789 + (42.954.981.615 -> 692.206.304 -> 57.182.470) = 7.821.959.259 @ 107.42 MB/s
37720: running Destroy snapshot (if any)
38162: x_one zfs destroy zp0/zd0@francoproxmox
381.635 seconds (000:06:21) (all OK)
zpaqfranz v57.3f-JIT-L, (14 Feb 2023)
franz:-key (hidden)
200_posta_G6.zpaq:
17 versions, 20 files, 702.583 frags, 3.263 blks, 7.826.948.327 bytes (7.29 GB)
-------------------------------------------------------------------------
< Ver > < date > < time > < added > <removed> < bytes added >
-------------------------------------------------------------------------
00000001 2023-02-09 17:55:38 +00000003 -00000000 -> 6.654.630.534
00000002 2023-02-09 18:47:44 +00000001 -00000000 -> 17.394.282
00000003 2023-02-11 13:40:51 +00000001 -00000000 -> 252.707.222
00000004 2023-02-11 17:09:04 +00000001 -00000000 -> 74.337.419
00000005 2023-02-11 17:43:25 +00000001 -00000000 -> 15.669.831
00000006 2023-02-11 19:04:01 +00000001 -00000000 -> 17.670.788
00000007 2023-02-12 00:00:01 +00000001 -00000000 -> 72.951.575
00000008 2023-02-12 08:00:01 +00000001 -00000000 -> 94.408.432
00000009 2023-02-12 16:00:01 +00000001 -00000000 -> 89.868.811
00000010 2023-02-13 00:00:01 +00000001 -00000000 -> 89.430.987
00000011 2023-02-13 08:00:01 +00000001 -00000000 -> 84.165.485
00000012 2023-02-13 16:00:01 +00000001 -00000000 -> 91.936.236
00000013 2023-02-13 17:30:36 +00000002 -00000000 -> 16.994.079
00000014 2023-02-14 00:00:01 +00000001 -00000000 -> 92.889.622
00000015 2023-02-14 08:00:01 +00000001 -00000000 -> 99.721.454
00000016 2023-02-14 15:45:30 +00000001 -00000000 -> 57.182.470
00000017 2023-02-14 16:00:01 +00000001 -00000000 -> 4.989.068
Today's update: at every run gets only some MBs of space (e-mails are generally small, deduplicable and compressible)
(100%) 100.00% 00:00:00 ( 40.00 GB)->( 19.43 MB) of ( 40.00 GB) 93.10 MB/se1 +added, 0 -removed.
7.951.716.338 + (42.954.981.615 -> 254.326.790 -> 22.662.906) = 7.974.379.244 @ 92.96 MB/s
37720: running Destroy snapshot (if any)
38162: x_one zfs destroy zp0/zd0@francoproxmox
441.390 seconds (000:07:21) (all OK)
zfsproxrestore
The corrispondent restore command, of course in the same "expectations"
Restore proxmox backups (on local storage) into /var/lib/vz and /etc/pve/qemu-server
Without files selection restore everything, files can be a sequence of WMIDs (ex. 200 300)
- -kill Remove snapshot (after backup)
- -not Do not restore (exclude)
Restore all VMs zfsproxrestore /backup/allvm.zpaq
Restore 2 VMs: 200 and 300 zfsproxrestore /backup/allvm.zpaq 200 300
Restore VM 200, release snapshot zfsproxrestore /backup/allvm.zpaq 200 -kill
Restore all VMs, except 200 zfsproxrestore /backup/allvm.zpaq -not 200 -kill
pre-compiled binaries
In this release I put some binaries for various platforms, I am checking in particular Synology-Intel-based NAS
zpaqfranz_linux "should" run just about everywhere (for 64 bit Intel-systems); zpaqfranz_qnap_intel on (just about) every 32 bit-Intel-based-simil-Linux systems
Windows 32/64 binary, 64 bit-HW accelerated
Windows 32/64 binary,HW accelerated, ESXi, Linux, Free/Open BSD
This is the first release (to be tested) of the new 57 series
Like any first release it has no bugs, but features :)
Remember: the more feedback (even negative) I receive, the greater the likelihood of improving zpaqfranz. And do no forget, please, the star on github or a review on sourceforge :)
The main difference is the internal refactoring (which can cause subtle problems in parameter/switch recognition), and especially the inclusion of a new metadata storage "package"
The new V3 (testable for now with whirlpool and highway) stores additional useful information (or rather will) and in the future also some sort of posix-style data
In short - to summarize - facilitate restoration to *nix of symlinks, proprietary users etc, similar to tar
So externally the changes look modest, but internally they are numerous
New hashers that can be used inside archive: whirlpool, highway 64/128/256
In addition to the supported control hashes, within the archives, such as XXHASH|SHA-1|SHA-2|SHA-3|MD5|XXH3|BLAKE3, you can now choose
- whirlpool. It is a "slow" hash that creates very large footprints, but it is based on a completely different technology than the others. I like it very much
- highway. It is a hash developed by two very good programmers the github which is actually not designed for use with large amounts of data (like zpaqfranz), but rather for (relatively) "small" packet indexing. In the case of zpaqfranz there are 3 different "versions" (actually it is the same, so there is no difference in speed) for 64, 128 and 256 bits. So it is (as you can understand) most useful for quick debugging with different-length hashes. The implementation is "straight" C (no AVX2 acceleration etc), and is not tested (at present) for use on BIG ENDIAN or sparc or "strange" systems. In fact a debug tool
zpaqfranz a z:\1.zpaq c:\nz -whirlpool
zpaqfranz a z:\1.zpaq c:\nz -highway64
zpaqfranz a z:\1.zpaq c:\nz -highway128
zpaqfranz a z:\1.zpaq c:\nz -highway256
zpaqfranz l z:\1.zpaq -checksum
The dir command is now better than... dir
As is well known, or maybe not, if the zpaqfranz executable is called "dir" it works, roughly, like the Windows dir
I use it so much on Linux and FreeBSD where the ls command doesn't look anything like what is needed for a storage manager (you need numerous other commands, hard to remember concatenations etc)
zpaqfranz dir c:\*.cpp /s /os
dir c:\*.cpp /s /os -n 100
will show all the "*.cpp" into c:\ (with recursion /s), ordered by size (/os) and limit at 100 (- n 100)
zfsbackup a bit evolved
During use with simple datasets, so far, zfsbackup seems to work better than expected. Clearly this is referred to systems with zfs (linux+openzfs, FreeBSD, Solaris)
The -kill switch will delete temporary files (otherwise you need to manually "purge" the /tmp folder)
sparc64: -DALIGNMALLOC
For sparc64 an experimental switch to try to align malloc(): must be used at compile time
Haiku OS https://www.haiku-os.org/
Yep, zpaq 7.15 is already in Haiku OS :)
zpaqfranz, not very tested, can be compiled on Haiku R1/beta4, 64 bit (gcc 11.2.0), hrev56721 (maybe the -pthread is redundand, but not a big deal)
g++ -O3 -Dunix zpaqfranz.cpp -o zpaqfranz -pthread -static
TrueNAS
It is an appliance based on FreeBSD 13x, which, however, lacks the compiler.
zpaqfranz can run inside a GUI-made jail, or outside (i.e. in the normal /usr/local/bin). The second case, of course, enable any function (including zfs backups), but have to be "injected" manually (with a SSH session, for example). Maybe I'll do a little HOW-TO
Comment on the source
The curious will see that (partial) refactoring is "strange," as it does not use very convenient features (e.g., RTTI) that would make it more compact and elegant. This is because of the inability to get "modern" compilers, in short for backward compatibility They will also notice that the handling of Boolean flags is peculiar. The reason is of performance within highly CPU-bound loops as in zpaqfranz. They will note that sometimes maps are used where **unordered_**map would be more efficient. But I can't, because they simply don't exist on certain systems (!). In short, it is in best tradeoff I have found (so far) between conciseness, maintainability, and breadth of supported platforms. Sometimes even to_string or atoi does not exists :)
The binaries
In this very first release there are
- zpaqfranz.exe (64bit Windows)
- zpaqfranz32.exe (32bit Windows)
- zpaqfranzhw.exe (64bit Windows w/HW SHA-1 acceleration via -hw switch, usually for AMD)
- zpaqfranz_esxi (32bit vSphere "maybe-will-run")
- zpaqfranz_freebsd (64bit statically linked)
- zpaqfranz_linux (64bit statically linked)
- zpaqfranz_openbsd (64bit statically linked)
- zpaqfranz_qnap (QNAP NAS TS-431P3 Annapurna AL314)
Of course the "right" way I recommend is to download the source and compile directly from scratch.
I attach them because they are convenient for me to do quick tests on as many systems as I can get
Windows 32/64 binary, FreeBSD statically linked
First (public) release with zfsbackup-zfsreceive-zfsrestore
NOTICE. There are virtually no test on input parameters. So place caution. If you need help just ask. In the next release I will include stringent checks
And now... THE SPIEGONE!
This release contains numerous features, both commonly used and specific to zfs
The first function is versum, something similar to a "smarter" hashdeep
Basically: verifies the hashes of the files in the filesystem, against a list in a text file
We want to verify that the backup-restore with zfs works well, without trusting
It can be fed with two types of files: those created by zpaqfranz itself, and those of hashdeep.
The former are written by the sum function with the appropriate switches (ez. zpaqfranz sum *.txt -xxh3). BTW zpaqfranz can write and read (-hashdeep switch) this fileformat.
In the following examples we will operate on the tank/d dataset with SSD/NVMe, working on fc (yep, francocorbelli snapshot)
You can use all hash types of zpaqfranz, in this example it will be xxhash64
Incidentally -forcezfs is used to have the example folder (which contains .zfs, being a snapshot) examined, otherwise zpaqfranz will ignore it
zpaqfranz sum /tank/d/.zfs/snapshot/fc -forcezfs -ssd -xxhash -noeta -silent -out /tmp/hash_xx64.txt
A possible alternative, to have third-party control (i.e. software other than zpaqfranz) is to use hashdeep
usually in the md5deep package
The essential difference of hashdeep from md5deep is the use of multithreading: it reads files from disk in parallel, so suppose we are operating with solid-state disks (or ... wait longer :) )
Various hashes can be selected, but as they are basically used as checksums and not as cryptographic signatures: md5 is more than fine (it is the fastest), at least for me
hashdeep -c md5 -r /tank/d/.zfs/snapshot/fc >/tmp/hashdeep.txt
BTW hashdeep does not have a find replace function, awk or sed is commonly used. Uncomfortable to say the least
To check the contents of the filesystem we have three chances
- the zpaqfranz hashlist
In this example, a multithreaded operation (-ssd) will be adopted, operating a renaming (-find/-replace) to convert the paths in the source file from the target ones
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
- the hashdeep
zpaqfranz is able to 'understand' the original format of hashdeep (look at the -hashdeep switch)
zpaqfranz versum z:\uno\_tmp\hashdeep.txt -hashdeep -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
- small-scale test, without reading from filesystem
If the hash function used to create the .zpaq file is the same as that of the .txt control file, you can operate it as follows
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -to thebak.zpaq -find /tank/d/.zfs/snapshot/fc -replace /tank/d
It should be remembered that the default hash of zpaqfranz is xxhash64, so if you want to use other hashes (e.g. xxh3, sha256 or sha3 etc.) you must, when creating the .zpaq file (the a command), add the relevant switch (e.g. -xxh3, -sha3, -blake3 etc.)
Recap
Complete example of creating an archive (on FreeBSD with zfs) to be then extracted on Windows with independent control
The source will be tank/d using -ssd for multithread
Take the snapshot fc of tank/d
zfs snapshot tank/d@fc
Get the hash list with xxhash64 into the file /tmp/hash_xx64.txt
zpaqfranz sum /tank/d/.zfs/snapshot/fc -forcezfs -ssd -xxhash -noeta -silent -out /tmp/hash_xx64.txt
Create hashdeep.txt w/md5 into /tmp/hashdeep.txt. Using md5 because very fast
WARNING: /sbin/zfs set snapdir=hidden tank/d should be required to "hide" .zfs folders to hashdeep. There is not an easy way to exclude folders in hashdeep
hashdeep -c md5 -r /tank/d/.zfs/snapshot/fc >/tmp/hashdeep.txt
Now making the backup (fixing path w/-to)
In this case the default hash function is used (xxhash), matching with hash_xx64.txt
We "inject" the two hash list, /tmp/hash_xx64.txt and /tmp/hashdeep.txt, to keep with the archive
zpaqfranz a /tmp/thebak.zpaq /tank/d/.zfs/snapshot/fc /tmp/hash_xx64.txt /tmp/hashdeep.txt -to /tank/d
Destroy the snapshot
zfs destroy tank/d@fc
Now transfer somehow thebak.zpaq to Win (usually with rsync)
Extracting everything to z:\uno (look at -longpath)
zpaqfranz x thebak.zpaq -to z:\uno -longpath
Verify files by zpaqfranz's hash list
Note the -find and -replace to fix source (on FreeBSD) and destination (on Windows) paths
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
Now paranoid double-check with hashdeep.
Please note the -hashdeep
zpaqfranz versum z:\uno\_tmp\hashdeep.txt -hashdeep -ssd -find /tank/d/.zfs/snapshot/fc -replace z:\uno\_tank\d
Finally compare the hashes into the txt with the .zpaq
zpaqfranz versum z:\uno\_tmp\hash_xx64.txt -to thebak.zpaq -find /tank/d/.zfs/snapshot/fc -replace /tank/d
Short version: this is an example of how to perform on a completely different system (Windows) the verification of a copy made from a .zfs snapshot with zpaqfranz. We will see how, in reality, it is designed for "real" zfs backup-restore
New advanced option: the -stdout
If the files are ordered-stored into the .zpaq, it is possible to -stdout
WHAT?
Files stored within .zpaq are divided into fragments (let's say 'chunks') which, in general, are not sorted.
This happens for various reasons (I will not elaborate), preventing the possibility of extracting files in stream form, i.e. as a sequence of bytes, as required by -stdout
This is not normally a serious problem (for zpaq 7.15) as it simply does not support mixing streamed and journaled files into an archive
Translation of the translation (!)
zpaq started out as a stream compressor (actually no, there would be a further very long explanation here that I will spare)
Processes any long sequence of bytes, one byte at a time, and writes a sequence of bytes in output: this is the so-called streamed format
It was present in older versions of zpaq, something analogous to gz just to give a known example.
Subsequently, the developer of zpaq (Matt Mahoney) implemented the so-called 'journaled' storage format, where each file has its various versions in it.
This is the 'normal' format, while the 'streamed' one has practically disappeared (vestiges remain in the source).
For a whole series of technical problems that I won't go into here, Mahoney decided not to allow the mixing of the two types:
- archives WITH VERSIONS (aka: modern)
XOR
- with streamed files (aka: OK for stdout)
The ability to write to stdout does not have much reason to exist, unless coupled with the ability to read from stdin, and zpaq 7.15 does not allow this, essentially operating by reading files from the filesystem "the usual way".
As you may have noticed (?) for some time now, I have instead evolved zpaqfranz to allow the processing of input streams (with -stdin)
The concrete reason is twofold
The first is to archive mysql dumps, the tool of which (mysqldump and various similar ones) output precisely a text file.
This way, you can use zpaqfranz to archive them versioned (which as far as I know is superior to practically any other system, by a WIDE margin).
The second is to make sector-level copies of Windows drives, in particular the C disk:
As you may have noticed (?) zpaqfranz is now able to back up (from within Windows) an entire system, either 'file-based' (with a VSS) or 'dd-style'
Obviously the 'dd' method will take up more space (it is good to use the f command to fill the free space with zeros) and will also be slower
BUT
it allows you to mount (with other software) / extract (e.g. with 7z) almost everything
If you are really paranoid (like me), what could be better than a backup of every sector of the disk?
Let us now return to why it is so important (and actually not trivial) to obtain archives in journaled format but with the possibility of orderly (=streamed) extraction
It is about speed
Streamed .zpaq archives exist, BUT listing (i.e. enumerating the list of files therein) is very slow, requiring a scan of the entire file (which can be hundreds of GB = minutes)
They are also extremely slow to be created (by the zpaqd 7.15 utility), essentially monothread (~10MB/s)
Instead, by having journaled (i.e. 'normal' zpaq format) but ORDERED archives, I can obtain all the benefits
Various versions of the same file, listing speed, and even creation speed (maintaining multithreading), at the cost of a (not excessive) slowdown in output, due to the use of -stdout instead of the more efficient filesystem
Why all this?
For zfs backups, of course, and especially restorations
There are now three new commands (actually very crude, to be developed, but that's the "main thing")
- zfsbackup
- zfsrestore
- zfsreceive
One normally uses .zpaq to make differential zfs backups, i.e. with one base file and N differential files, which are stored as different versions. This is good, it works well, and it is not fragile (differential means that two files are enough to restore). The "normal" method for "older" zpaqfranz.
BUT
it takes up space: as the differential snapshots get bigger and bigger, it is a normal problem for any differential system
On the other hand, using incremental zfs snapshots has always been very risky and fragile, because it only takes smal...
Windows 32/64 binary
Windows 32/64 binary
Windows imaging (first release)
Sector-level image of Windows partition (admin rights required) by internal imager or... dd (!)
Yes, now there is a (GNU coreutil) dd embedded in the Windows executable (used for test the buffered -stdin)
Suggestion: fill to zero unused space before imaging (save space)
zpaqfranz f c:\ -zero
How to extract the zpaq archive after formatting the C partition?
Simply, you can't :)
Imaging of partitions (of course C too) is NOT (yet) something like Acronis or Macrium
It is (or should be) a full-backup that, in case of emergency, you need to
- Restore (extract from .zpaq the .img)
- Mount with something else (es. OSFMount https://www.osforensics.com/tools/mount-disk-images.html), then copy-and-paste your files
- OR open the .img with 7zip (supposing NTFS format), then extract with 7zip
- OR write back the image with "something" (example dd) into a virtual machine, or even the "real" HW (booting from USB key, for example)
work in progress...
Switch -dd (Windows)
Make an image with dd, via a script (beware of antivirus)
zpaqfranz a z:\2.zpaq c: -dd
Two additional parameters: -minsize bs and -maxsize count (just like dd)
Cannot use more add() parameters, but yes the -key (for encryption)
Next releases: dd-over-VSS
Switch -image (Windows)
Use internal imager to backup a partition. It is possible to use almost all "normal" add switch PLUS the new -buffer, useful for SSD
zpaqfranz a z:\2.zpaq e: -image
zpaqfranz a z:\2.zpaq c: -image -buffer 1MB -key pippo
-buffer X switch (in add)
Use a larger input buffer (zpaq's default is 4KB), typically 64KB or 1MB
-image switch in extraction (Windows)
Restore huge image w/smart progress, by default every 20MB or -minsize X
print_progress() cannot handle huge file (ex. vmdks) due to seek-and-write on
filesystems without "smartness"
example: seek @ 300GB and write 1KB
FS must write 300GB of zeros, then 1KB
with a slow spinning drive this can seems a "freeze"
=>when extracting this kind of file use -image and even -minsize something
if something is 1 all writes (and seeks) will be showed
zpaqfranz x copia.zpaq -to z:\prova\ -image
buffered -stdin
Much faster, good for archiving mysqldump piping into zpaqfranz
Minor fixes
- On -NOJIT (ex. Apple M1) shows more info on failed allocx
- Extended Windows error decoder
References:
stdin
Apple M1
Imaging
If you want please leave a review on Sourceforge
and put a star on github (if you haven't already).
Any comment or suggestion is welcome. Thanks for collaboration
Windows 32/64 binary
First public release with (a bit) for the cloudpaq
The next "big thing" is zpaq-over-ssh, or (better), zpaqfranz-over-TCP-socket
Some more
zpaqfranz can send an (encrypted) copy of the archive via "internet" (TCP) to a remote server, running the new (soon-to-be-released) cloudpaq
The server will make a (lot) of checks before update the local version of the archive
Of course all this "mess" is for a ransomware-resilient archiver
It is rather hard to implement, currently it works with an entire file, but the sending of subsequent updates needs to be implemented
There is definitely an overkill in the security methods used - in practice I expect to use it over an ssh tunnel, so authentication and encryption issues are actually redundant - but after all it is a hobby.
However, in my spare time maybe I will complete it :)
News against 55.x
-flagbig
Shows a BIG ASCII-text in the result. Useful for crontabbed-e-mail-sended results
C:\zpaqfranz>zpaqfranz t z:\1.zpaq -big
zpaqfranz v56.1j-JIT-L (HW BLAKE3), SFX64 v55.1, (15 Nov 2022)
franz:-big
Archive seems encrypted (or corrupted)
Enter password :***
z:/1.zpaq: zpaqfranz error:password incorrect
23013: zpaqfranz error: password incorrect
1.453 seconds (000:00:01) (with errors)
####### ###### ###### ####### ###### ###
# # # # # # # # # ###
# # # # # # # # # ###
##### ###### ###### # # ###### #
# # # # # # # # #
# # # # # # # # # ###
####### # # # # ####### # # ###
-checktxt
After an "add", create a file with a full xxh3 hashing of the archive (note: without parameter take the same name of the archive, with .txt)
zpaqfranz a z:\knb.zpaq c:\nz\ -checktxt z:\pippo.txt
(...)
zpaqfranz v56.1j-JIT-L (HW BLAKE3), SFX64 v55.1, (15 Nov 2022)
Creating XXH3 check txt on z:/pippo.txt
44202: final XXH3: hash 6C68C17F625AF11AA734E8D122241789
Now you can send the .txt file (with rsync, in future with cloudpaq) on a remote server, then do a quick check
zpaqfranz sum z:\knb.zpaq -checktxt z:\pippo.txt -big
C:\zpaqfranz>zpaqfranz sum z:\knb.zpaq -checktxt z:\pippo.txt -big
zpaqfranz v56.1j-JIT-L (HW BLAKE3), SFX64 v55.1, (15 Nov 2022)
franz:-big
franz:checktxt <<z:/pippo.txt>>
Checking XXH3 (because of -checktxt) on z:/pippo.txt
Hash from checktxt |6C68C17F625AF11AA734E8D122241789|
0.063 seconds (00:00:00) (all OK)
####### # #
# # # #
# # # #
# # ###
# # # #
# # # #
####### # #
Now - on the remote server - something like
/usr/local/bin/zpaqfranz dir "/home/pizza/copie/" -noeta >/tmp/checkpizza.txt
/usr/local/bin/zpaqfranz sum /home/pizza/copie/$1 -big -noeta -checktxt /home/pizza/
copie/$2 >>/tmp/checkpizza.txt
if [ -f /tmp/checkpizza.txt ]; then
/usr/local/bin/smtp-cli --missing-modules-ok (...) -cc $3 -subject "CHECK-pizza-backup" -body-plain=/tmp/checkpizza.txt
fi
Short version: quickly compare a local archive with a rsync-ed remote one, tanking the free space (of the remote server) too, by e-mail
-stdin
During add it is now possible to take the stdin flux. Typically to get mysqldump's backup or a full image of a live Windows (!) via dd
zpaqfranz a z:\1.zpaq mydump.sql -stdin
Or...
c:\nz\dd if="\\\\.\\c:" bs=1048576 count=100000000000 |c:\zpaqfranz\zpaqfranz a j:\image\prova cimage.img -stdin
-windate (on Win)
Add/Restore (if any) file's creation date. Please note: this will force xxhash64 hash [the default]
C:\zpaqfranz>zpaqfranz a z:\2 *.cpp -windate
zpaqfranz v56.1j-JIT-L (HW BLAKE3), SFX64 v55.1, (15 Nov 2022)
franz:winhash64 (-windate)
Creating z:/2.zpaq at offset 0 + 0
(...)
0.250 seconds (00:00:00) (all OK)
C:\zpaqfranz>zpaqfranz x z:\2.zpaq -to z:\kajo -windate
-all -comment (in x, extract)
Mark extracted versions with ASCII comment, if present
zpaqfranz x copia.zpaq -to z:\prova\ -all -comment
range in extraction
Extract the files added from version X to Y, or 1 to X, or X until end. Something like github (!)
In this example I want to get all the different comp.pas source code, from version 100 to 1000
zpaqfranz x copia.zpaq -only *comp.pas -to z:\allcomp -all -range 100-1000
in utf command -fix255
To quickly find longfiles on a folder
zpaqfranz utf c:\vm -fix255
-isopen on a (add)
Quickly abort if the file is already opened (on Windows). For multi-virtual-machines backup on the same archive
fixes
- On Windows w with longpath now sanitize path better
- Better (but not perfect) handling for OneDrive folder #37
- Skip .zfs by default for dir mode
- Extended -debug output (try to catch really weird NTFS' "something"
- Fixed output