Skip to content

compression

holzkohlengrill edited this page Dec 15, 2023 · 16 revisions
  1. Compressing & Archiving Files
    1. tar + companion programs (xz, bzip2
      1. Compress xz/bzip2
        1. Useful xz options
      2. Decompress
      3. List metadata
      4. Notes
      5. See also
    2. tar + lz4
      1. Compress lz4
      2. Extract/Decompress lz4
    3. 7zip (sudo pacman -S p7zip)
      1. Usage
        1. Add files to new archive
        2. List archive contents
      2. (De-)compression ratio/speed, ...
      3. Compression methods
      4. Compression levels (.7z)
      5. See also
    4. Compability

Compressing & Archiving Files

Difference between compression and archiving:

Kind Behaviour
Archiving only Multiple files => one file (no compression)
Compression only Multiple files => multiple compressed files
Archiving & Compression Multiple files => one compressed file

=> Most widespread tools can be found on the Arch Linux Wiki

Before compressing huge amounts of data it's worth performing a dry run with cp/rsync beforehand in order to check if any invalid filenames etc. might appear. It is worth the time at the end since compression may take hours or days on a standard PC.

tar + companion programs (xz, bzip2)

Compress xz/bzip2

#            vv-- globbing
tar cf - file0* | xz -T 0 -4 -vv - > test-new.tar.xz
#      ^--- no file specified due to pip (see also: https://unix.stackexchange.com/a/41829/116710)

# Is equivalent to:
#                         v-- use xz (J: xz, j: bzip2)
XZ_OPT="-T 0 -4 -vv" tar cfJ text-new.tar.xz file0*
# ^^^^       ^^--- compression preset/level
# ||||
# xz argument env var (for simple args you can write: XZ_OPT=-e9)

Useful xz options

  • -T: Number of CPU cores (0: use all available)
  • -[0...9]: compression preset/level (0: fastest; 9: slowest, best compression; xz auto-adjust this setting if you would run out of RAM)
  • -v: Verbose (shows (un-)compressed sizes, compression ratio)
  • -vv: Verbose (shows (un-)compressed sizes, compression ratio, memory required for (de-)compression, threads)
  • -e: Exteme (trade CPU time for better compression ratio; memory does not increase)
  • -M/--memory: Limit memory to a certain limit (-M 0 = -M 40% = default) => e.g. -M 70%, -M 800MiB, ...

Decompress

# Auto-detect compression
tar xf test.tar.xz
# Explicitly state xz archive
tar xfJ test.tar.xz
# Decompress into PREEXISTING folder
tar xf test.tar.xz -C test-decomp

List metadata

xz -l test.tar.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1    690.9 KiB    940.0 KiB  0.735  CRC64   test.tar.xz

Notes

  • xz dict size and compression levels/presets are the same as lzma2 (see below)
  • xz uses lzma(2)
  • tar: if you state arguments via the dash syntax make sure f is the last option (-cfJ does not work, cfJ/-cJf works)

See also

tar + lz4

lz4 is pretty interesting since it has a good balance of compression ration plus good (de-)compression speeds.

Compress lz4

tar cf - /input/path/ | lz4 -6 - /output/archive.tar.lz4
#                           ^^-- Compression ratio [1; 12]; I only recommend levels from 1 to ~6 (others are slow for the achieved compression ratio)

Extract/Decompress lz4

lz4 -d ./input.tar.lz4 | tar xvf -

7zip (sudo pacman -S p7zip)

Personally, I had problems with 7zip for very large folders (> 500 GiB). I moved to tar + lz4 for such use cases.

Usage

Add files to new archive

# Add files to archive
#                                files to compress (globbing is allowed)
#                                vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
7z a -m0=lzma2 -mx=7 backups.7z file1.txt aFolder file1.csv blahFiles*
#  ^   ^^^^^^     ^--------- compression
#  |   compression      ^^   level
#  |   method           ||
# add                  archive name

List archive contents

7z l ./backups.7z

(De-)compression ratio/speed, ...

https://quixdb.github.io/squash-benchmark/#results-table

Compression methods

  • LZMA
  • LZMA2
  • PPMd
  • BZip2
  • Deflate
  • Delta
  • BCJ
  • BCJ2
  • Copy

Compression levels (.7z)

Level Meaning Dict size
0 Copy --
1 Fastest 64 Kb
3 Fast 1 MB
5 Normal 16 MB
7 Maximum 32 MB
9 Ultra 64 MB

See also

Compability

Format 7z / 7zip xz OS
.tar.xz r/-/m r/w/m Windows / Linux
.7z r/w/- r/-/- Windows / Linux (no metadata)

=> r/w/m: read/write/modify

Clone this wiki locally