-
Notifications
You must be signed in to change notification settings - Fork 796
Performance comparison
Platform : Intel i7-9700K, Ubuntu x64 20.04, clang v10.00
Hash Name | Width | Large Data Bandwidth (GB/s) | Small Data Velocity | Comment |
---|---|---|---|---|
XXH3 (SSE2) | 64 | 31.5 GB/s | 133.1 | |
XXH128 (SSE2) | 128 | 29.6 GB/s | 118.1 | |
RAM sequential read | N/A | 28.0 GB/s | N/A | |
City64 | 64 | 22.0 GB/s | 76.6 | |
T1ha2 | 64 | 22.0 GB/s | 99.0 | Slightly worse collision ratio |
City128 | 128 | 21.7 GB/s | 57.7 | |
XXH64 | 64 | 19.4 GB/s | 71.0 | |
SpookyHash | 64 | 19.3 GB/s | 53.2 | |
Mum | 64 | 18.0 GB/s | 67.0 | Slightly worse collision ratio |
XXH32 | 32 | 9.7 GB/s | 71.9 | |
City32 | 32 | 9.1 GB/s | 66.0 | |
Murmur3 | 32 | 3.9 GB/s | 56.1 | |
SipHash | 64 | 3.0 GB/s | 43.2 | |
HighwayHash | 64 | 1.4 GB/s | 6.0 | |
FNV64 | 64 | 1.2 GB/s | 62.7 | |
Blake2 | 128 | 1.1 GB/s | 5.1 |
Note : Small data velocity is a rough average of algorithm's efficiency for small data. For more accurate information, please refer to graphs in second part below.
The following algorithms run on modern x64 cpus, but are either not portable, or their performance depend on the presence of a non-guaranteed instruction set (only SSE2
is guaranteed on x64
).
Hash Name | Width | Large Data Bandwidth (GB/s) | Small Data Velocity | Comment |
---|---|---|---|---|
XXH3 (AVX2) | 64 | 60.5 GB/s | 133.1 | AVX2 support (optional) |
MeowHash (AES) | 128 | 58.2 GB/s | 52.5 | Requires x64 + hardware AES (not portable) |
XXH128 (AVX2) | 128 | 55.8 GB/s | 118.1 | AVX2 support (optional) |
CLHash | 64 | 37.1 GB/s | 58.1 | Requires PCLMUL intrinsics (not portable) |
FarmHash | 64 | 21.3 GB/s | 71.9 | Results vary depending on instruction set |
CRC32C (hw) | 32 | 13.0 GB/s | 57.9 | Requires SSE4.2 |
CRC32C can be done in software, but performance becomes drastically slower.
Note : some algorithms are effectively faster than system's RAM. Their full speed can only be leveraged when hashing data segments already present in cache.
It's interesting to note that the layers of caches are clearly visible in the graph. The host cpu has an L1 cache of 32 KB, L2 cache of 256 KB, and L3 cache of 12 MB. Performance of the AVX2
variant visibly plummets on passing each of these thresholds.
So far, we have looked at 64-bit performance, where 64-bit hashes have an advantage, if only by virtue of ingesting data faster. The situation changes drastically on 32-bit systems, as emulating 64-bit arithmetic on 32-bit instruction set takes a big toll on performance.
The following graph shows how these hash algorithms fare in 32-bit mode. Most of them suffer a large decrease in performance. Only 32-bit hashes maintain their performance, like XXH32
. But good news is, XXH3
is also good on 32-bit cpus, and even more so when a vector unit is available.
In many use cases, the generated hash value can be used to populate a hash table or a bloom filter. In these scenarios, it's frequent that the hash algorithm is just a part of a larger system, such as a database, and it will be invoked for a lot of rather small objects. When hashing small objects, hash properties can vary vastly compared to hashing "large" inputs. Some algorithms, for example, may have a long start or end stages, becoming dominant for small inputs. The below tests try to capture these characteristics.
Throughput tests try to generate as many hashes as possible. This is classically what is measured, because it's the easiest methodology for benchmark. However, it is only representative of "batch mode" scenarios, and also requires all inputs to feature exactly the same length. So it's effectively not that representative at all.
In this more advanced scenario, input length is not longer stable. This situation is generally more common, so the scenario is more likely to be representative of some real use case. However, being a throughput test, it's still targeting batch-mode scenarios, where a lot of hashes must be produced repetitively. In the graph below, length is effectively random, read from a generated series of value between 1 and N. This situation is much harder on the branch predictor, which is likely to miss quite a few guesses, occasionally triggering a pipeline flush. Such negative impact can quickly dominate performance for small input. Some algorithm are better design to resist this side-effect.
Latency tests require the algorithm to complete one hash before starting the next one. This is generally more representative of situations where the hash is integrated into a larger algorithm.
=== benchmarking meowh hash functions ===
Throughput small inputs of fixed size :
meowh , 65787385.6, 65844105.4, 65863964.3, 65844637.7, 65869532.0, 65892430.6, 65853500.7, 65853515.2, 65867594.0, 65913947.5, 65873020.5, 65881190.3, 65861045.0, 65871610.4, 65855408.6, 83414172.4, 66207586.6, 66146234.7, 66192991.3, 66102162.7, 66192045.2, 66192309.1, 66193464.2, 66080562.8, 66156578.6, 66187151.9, 66192518.3, 66117645.7, 66140351.9, 66202624.1, 66135636.3, 79580277.3, 63385998.1, 63389437.6, 63364834.4, 63391423.5, 63385252.5, 63405039.7, 63338309.8, 63401949.7, 63380281.7, 63391964.2, 63384010.7, 63399508.1, 63392085.3, 63394442.5, 63366825.6, 78417528.6, 63233944.4, 63223189.1, 63245802.3, 63245851.5, 63215755.4, 63229949.5, 63218255.6, 63223424.7, 63227043.4, 63149236.2, 63249271.7, 63232153.3, 63212240.2, 63231271.5, 63222717.6, 78296420.5, 62551895.9, 62580834.8, 62505507.9, 62499264.9, 62552210.2, 62588293.5, 62532703.9, 62602626.0, 62516914.7, 62500735.1, 62513235.6, 62509180.4, 62515784.6, 62495597.2, 62507707.8, 76929863.8, 62113335.4, 62129116.1, 62133797.7, 62130212.3, 62122798.6, 62118521.5, 62133763.2, 62125827.7, 62109391.1, 62116623.0, 62130873.7, 62057216.7, 62132004.8, 62119509.9, 62126818.5, 73371258.0, 58754330.6, 58785449.9, 58763341.3, 58770588.2, 58779214.5, 58756078.8, 58758490.9, 58738791.7, 58739120.2, 58770237.2, 58778540.6, 58752602.7, 58764378.0, 58767131.3, 58779910.8, 72107154.8, 59384211.7, 59378510.8, 59358474.4, 59391030.2, 59375073.5, 59375661.6, 59319444.9, 59379153.8, 59381892.0, 59379963.5, 59364435.4, 59371471.6, 59376819.9, 59370129.7, 59376176.6
benchmarking random size inputs [1-N] :
meowh , 65846769.4, 65867382.4, 65840401.9, 65853543.8, 65890331.9, 65843500.4, 65822531.6, 65865619.5, 65853687.2, 65837282.3, 65860373.2, 65847498.0, 65805715.9, 65857518.5, 65835179.9, 65169015.6, 64833710.7, 64682817.3, 64440249.6, 64199724.6, 64087026.1, 63356748.7, 63242524.8, 63075365.5, 62896188.4, 62840437.2, 62881099.8, 62590923.8, 62596236.4, 62307787.3, 62908727.4, 63876250.1, 64342311.9, 63392468.2, 63452941.2, 62945249.4, 63102938.8, 63833918.8, 62875820.8, 62748283.0, 63218086.2, 62905729.6, 62708621.6, 62597551.0, 62584054.9, 62210694.1, 62746227.9, 63139047.3, 63461334.8, 62967160.1, 62912952.7, 62931131.0, 62385126.5, 62633221.5, 62233434.0, 62456557.9, 62119818.9, 62381028.2, 62580151.8, 62213594.5, 62449978.8, 62065675.7, 61738691.2, 62066445.5, 62516945.1, 62964206.4, 62752427.7, 62338227.7, 62382364.0, 62186435.8, 62365819.1, 62952354.0, 62867872.1, 62599137.9, 62298768.0, 63027192.8, 62119653.1, 62886518.9, 63168622.5, 63559346.9, 63054592.5, 63379493.6, 62799797.3, 63131922.8, 62582573.4, 62935910.5, 63055651.9, 62781218.6, 62765630.9, 63028009.2, 63066403.0, 62627453.3, 61988338.5, 62880014.1, 62751365.2, 63130703.9, 63666197.3, 63029312.1, 62963094.1, 62947202.2, 63053664.6, 62911247.4, 63129439.2, 63055465.5, 62936443.8, 63488701.8, 63467018.3, 63150083.0, 62621871.1, 62834051.9, 63003589.6, 63004447.4, 62520215.7, 62851926.0, 62718509.6, 62819215.3, 62690545.4, 62398908.4, 62543416.6, 62700699.9, 62793544.7, 62303387.3, 62515092.8, 62123518.3, 62483785.0, 62721378.2, 62373380.4
Latency for small inputs of fixed size :
meowh , 48830064.1, 48705461.6, 49450905.0, 49567089.4, 48978655.3, 48812478.7, 49917627.7, 49092096.2, 49204435.2, 49094984.0, 48290992.9, 48927397.1, 48887215.9, 49910307.3, 49921180.2, 55971048.0, 45679934.6, 47554052.1, 48203613.9, 48862674.1, 48268273.5, 48531072.1, 48188378.1, 48240041.4, 48207867.4, 48806169.4, 48170032.6, 47938847.0, 48656978.0, 48464163.8, 49501866.2, 57215979.3, 46820676.6, 47174860.5, 48198865.9, 47972325.0, 48210767.7, 46132580.4, 47127194.0, 47430272.1, 47107418.6, 46885843.7, 48198722.7, 46927479.5, 47289110.6, 46570429.0, 47052320.9, 53080791.7, 48085631.9, 47114972.1, 47239000.4, 47032323.3, 46647213.2, 47317294.2, 47721678.5, 47032527.5, 47227189.3, 48069118.8, 48085972.4, 48160173.2, 46953493.6, 47065536.5, 47113427.5, 54972975.2, 46927670.1, 46821050.7, 47055225.2, 46457600.6, 46950236.0, 46790539.5, 46698190.9, 46293900.4, 46237368.7, 45901292.2, 47449891.8, 46169322.8, 46576470.6, 46401214.3, 46789989.1, 52717356.1, 46597533.1, 47620169.2, 46805156.9, 47387427.7, 46248588.4, 47756295.3, 46756561.3, 46516141.6, 46864247.8, 46394936.4, 47175415.4, 46652117.9, 46928859.7, 46385233.9, 46699260.8, 52752995.5, 44521031.3, 44212185.6, 45138786.8, 44509931.9, 44185212.0, 44609403.0, 44292965.2, 43989459.9, 44274234.3, 44754800.8, 44360530.8, 44877262.5, 44239354.9, 44828458.6, 44233993.1, 49684983.3, 46029878.8, 45165237.2, 45037943.4, 44784370.5, 45781006.4, 45112914.6, 45521914.0, 45190164.5, 44403134.3, 44907429.2, 45684801.3, 45137683.8, 45435281.7, 44892460.5, 44968409.2
Latency for small inputs of random size [1-N] :
meowh , 48893412.2, 48997406.0, 49038981.8, 48955079.9, 49035984.4, 48998729.3, 49007269.0, 48962689.8, 48805454.7, 48915632.9, 48959565.1, 48922217.3, 48965858.0, 48920617.9, 48935006.3, 49182352.9, 49115763.2, 49093358.0, 49125944.1, 49129514.2, 49229502.4, 49092397.1, 49065393.1, 49149082.6, 49113602.4, 49090042.5, 49078311.2, 49093722.2, 49040198.1, 48934142.7, 49260736.1, 49323580.8, 49302238.0, 49087080.3, 48976999.4, 48951789.4, 48873020.7, 48950390.7, 48833995.0, 48744469.5, 48796838.6, 48782640.4, 48626712.9, 48704010.9, 48663375.4, 48471728.7, 48460819.0, 48637639.9, 48614710.5, 48491157.5, 48505202.2, 48443354.3, 48490339.6, 48335900.1, 48258586.5, 48372580.0, 48372202.5, 48169386.9, 48185047.5, 48307616.3, 48214737.3, 48071669.8, 48166473.5, 48232097.5, 48140764.4, 48242826.5, 48202696.1, 48164989.2, 48077658.5, 47971760.6, 48088576.2, 47964381.1, 47993368.1, 47971955.6, 47845819.7, 47968044.9, 47883230.6, 47877756.9, 47841203.9, 48024841.7, 47796107.6, 47976708.6, 47936500.5, 47819640.9, 47875970.8, 47930783.0, 47858956.9, 47885418.5, 47777418.7, 48030541.9, 47910775.1, 47850579.5, 47759524.2, 47789598.7, 47808390.0, 47754329.0, 47802013.5, 47864173.1, 47901478.6, 47719587.1, 47697203.9, 47613162.9, 47671155.5, 47570868.1, 47533896.1, 47533648.1, 47546201.8, 47486871.3, 47358855.7, 47373405.3, 47370463.1, 47220572.4, 47351827.0, 47280002.4, 47272812.8, 47271765.6, 47184295.8, 47330839.5, 47281949.9, 47206715.4, 47214086.6, 47133293.3, 47250485.2, 47129749.4, 47170366.2, 47210982.0, 47096869.0