GitHub - zholos/tar-sorted: sort files to improve tar compression

tar-sorted sorts files in an order designed to improve tar file compression:

by extension and filename, so similar files appear consecutively in the tar stream
by MD5 checksum, so identical files appear consecutively in the tar stream

Whether this actually works depends on the data. The intended use is manual backups, specifically compressing multiple snapshots of the same filesystem (with many identical files across snapshots) into the same tar file.

By default tar-sorted lists filenames to stdout, much like find:

tar-sorted -0 backups/* | tar cf - --null --no-recursion -T - | xz -9e >backups.tar.xz

It can also create a tar file directly:

tar-sorted -c backups/* | xz -9e >backups.tar.xz

When creating a tar file directly mtimes are preserved but permissions are not. In either mode only directories and regular files are supported.

In order for two identical files that appear consecutively in the tar stream to be compressed efficiently, the xz dictionary must be at least as large as either file. The size of the dictionary at -9 is 64 MB. The maximum possible compression setting is --lzma2=preset=9e,dict=1536m (1.5 GB dictionary), but this is slow and needs more than 15 GB of RAM.

Another way to handle identical files is to store subsequent files as hardlinks to the first. tar-sorted -cl does this. However, it's not possible to extract only parts of such a tar file, and extracted identical files will be hardlinked to each other, which may not be desired.

This is experimental software. After creating the archive, please extract and verify its contents (try dedup -d).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
tar-sorted.py		tar-sorted.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

License

zholos/tar-sorted

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages