Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved hashing algorithm #214

Merged
merged 3 commits into from
Feb 26, 2019
Merged

Improved hashing algorithm #214

merged 3 commits into from
Feb 26, 2019

Conversation

syrusakbary
Copy link
Member

@syrusakbary syrusakbary commented Feb 26, 2019

Based on this feedback:
https://twitter.com/jedisct1/status/1100200484764303361

Speed analysis

These are the speeds when hashing a 1Mb file (nginx):

  • meowhash hashes in 50ns (1Mb file) - 1x
  • blake2bp (SIMD version) hashes in 500ns - 10x
  • blake2 (non-SIMD version) hashes in 1ms - 20x
  • sha256 (previous implementation) hashes in 5ms - 100x

blake2

nginx HASH              time:   [1.0330 ms 1.0471 ms 1.0589 ms]
                        change: [-3.3836% -0.6012% +2.1520%] (p = 0.69 > 0.05)

blake2bp

nginx HASH              time:   [510.72 us 526.18 us 541.60 us]
                        change: [-6.8755% -1.4050% +3.7167%] (p = 0.66 > 0.05)

meowhash

nginx HASH              time:   [47.033 us 48.020 us 49.028 us]
                        change: [-95.542% -95.334% -95.097%] (p = 0.00 < 0.05)

Other resources

Collision analysis: cmuratori/meow_hash#7 (no collisions in a 156Gb dataset for 2M files, in the non-truncated version).

Hacker News article: https://news.ycombinator.com/item?id=18262627

@jedisct1
Copy link

You're using blake2, not blake2bp here. See https://docs.rs/blake2b_simd/0.4.1/blake2b_simd/blake2bp/index.html

@syrusakbary
Copy link
Member Author

@jedisct1 thanks for the ping! I'll update the code and benchmark soon.

@syrusakbary
Copy link
Member Author

After researching more we decided to merge this since the timings with blake2bp (500ns for a 1Mb file) are good enough for our use case.

Thanks @jedisct1 and @briansmith for bringing this to our attention.

@syrusakbary syrusakbary merged commit e5dc0b1 into master Feb 26, 2019
@syrusakbary syrusakbary deleted the fix/better-hashing branch March 4, 2019 15:46
nlewycky added a commit that referenced this pull request Aug 13, 2020
Rename CompilationNamer to SymbolRegistry and use it in compiler-llvm to name generated functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants