xxHash v0.7.3
xxHash v0.7.3 is major evolution for xxh3
and xxh128
, with a focus on speed and dispersion performance.
Speed improvements
v0.7.3
pays a lot of attention to small data, by delivering generally faster latency metrics (about +10%).
Inlining is now a first class citizen, as it is generally key to best performance on small inputs.
Among the visible changes:
XXH_INLINE_ALL
can always be set before includingxxhash.h
, even ifxxhash.h
was previously included (for example transitively, as part of a prior*.h
header file).- The algorithm implementation has been transferred into
xxhash.h
. It's no longer necessary to keep a copy ofxxhash.c
in the/include
directory for inlining to work correctly.- Note:
xxhash.c
still exists, as it's useful to instantiate xxhash functions as public symbols accessible from a library or a*.o
object file. It also remains compatible with existing projects.
- Note:
Large data has also received a boost, which can go up to +20% for very large samples (> many MB).
Let's underline the remarkable optimization work of @easyaspi314, who hand optimized several hot loops and instructions, and even added a new Z-vector target for s390x
hardware.
No API modification
The API has remained completely stable between 0.7.2 and 0.7.3. Any programs linking with 0.7.2 should work as-is.
Note that xxh3
/xxh128
results are not comparable across these versions.
New test tool
Testing a 64-bit hash algorithm for its collision rate has remained elusive for most. The sheer volume of data required to assess quality at this scale is too large for traditional test tools like SMHasher
. As a general guide, it requires 4 billion hashes to reach a 50% probability of getting a single collision. Accurate collision ratio evaluation requires many more hashes to actually measure something meaningful.
A new open-source tool in tests/collisions
offers this capability. It requires a lot of memory to run, with a minimum of 32 GB to measure anything significant. But provided that one has a system with enough capacity, it can accurately measure the collision ratio of any 64-bit hash algorithm.
Several algorithms were measured thanks to this tool, the result of which is currently consolidated on this wiki page. More can be added in the future.
This new development round also introduced several improvements to the SMHasher
test suite, uncovering new requirements for new scenarios. This proved beneficial to improve the general dispersion qualities of xxh3
and xxh128
.
Changelist
Here is a summarized list of changes for this version:
- perf: improved speed for large inputs (~+20%)
- perf: improved latency for small inputs (~10%)
- perf: s390x Vectorial code, by @easyaspi314
- cli: Improved support for Unicode filenames on Windows, thanks to @easyaspi314 and @t-mat
- api:
xxhash.h
can now be included in any order, multiple times, with and withoutXXH_STATIC_LINKING_ONLY
orXXH_INLINE_ALL
- build: xxHash's implementation has been transferred into
xxhash.h
. There is no more need to havexxhash.c
in the/include
directory forXXH_INLINE_ALL
to work - install: created pkg-config file, by @bket
- install: VCpkg installation instructions, by @LilyWangL
- doc: Highly improved code documentation, by @easyaspi314
- misc: New test tool in
/tests/collisions
: brute force collision tester for 64-bit hashes