Skip to content

Voodoo deduplication

Franco Corbelli edited this page Sep 2, 2023 · 1 revision

zpaq use a rather advanced method for deduplication. Not the most efficient in the World, but reliable and (rather) low RAM usage. Based on SHA-1 silently suffer from SHA-1 collisions. zpaqfranz detect those situations

-fragment N

Set the dedupe fragment size range from 64 2^N to 8128 2^N bytes with an average size of 1024 2^N bytes. The default is 6 (range 4096..520192, average 65536). Smaller fragment sizes can improve compression through deduplication of similar files, but require more memory and more overhead. Each fragment adds about 28 bytes to the archive and requires about 40 bytes of memory. For the default, this is less than 0.1% of the archive size.

Values other than 6 conform to the ZPAQ specification and will decompress correctly by all versions, but do not conform to the recommendation for best deduplication. Adding identical files with different values of N will not deduplicate because the fragment boundaries will differ. list -summary will not identify these files as identical for the same reason.

VERY short version: leave the default on :)

Clone this wiki locally