Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for "blinding multiplication" #99

Closed
erthink opened this issue Feb 5, 2020 · 18 comments
Closed

Test for "blinding multiplication" #99

erthink opened this issue Feb 5, 2020 · 18 comments
Assignees

Comments

@erthink
Copy link
Contributor

erthink commented Feb 5, 2020

I recently briefly described the "blinding multiplication" flaw of NH-approach hash functions. This seems to be an obvious thing, but many users do not know about it and have no idea about the possible consequences.

This looks to me to be a rather serious drawback, despite the low probability, since some of the data does not affect the hash value at all (e.g. wyhash case).

I think we need to come up with some kind of test that would allow us to clearly identify such flaws.

@erthink
Copy link
Contributor Author

erthink commented Feb 5, 2020

@wangyi-fudan, FYI.

@erthink
Copy link
Contributor Author

erthink commented Feb 5, 2020

@gzm55, FYI

@wangyi-fudan
Copy link
Contributor

Dear leo-yuriev:
It is not a significant problem in wyhash_v5.
I write the porblem here: MUM(a,b)==0 when a=0 or b=0. so we have chance to loss entropy.
wyhash_v5 use secret-mask-MUM: MUM(a^secret1,b^secret2)
unless a==secret1 or b==secret2 we lose entropy.
the chance is as low as 2^-64.

@erthink
Copy link
Contributor Author

erthink commented Feb 6, 2020

Dear @wangyi-fudan, as I commented wangyi-fudan/wyhash#49:

The actual chance is 2^-63 per each multiplication. Moreover, the later a zero rolls, (then) the greater the losses will be. For example, if a zero rolls at the end of the main loop, then the 1/4 of the data will be lost.

For many applications, this is an unacceptable flaw. Therefore, users should know about the price of speed, in order to make their own conclusions and assess the risks (regardless of how you assess the severity of the problem).


Now SMHasher includes a table listing many drawbacks of hash functions, and many developers use this information to select hash functions. Therefore, it is irresponsible to keep silent about an aspect, especially if it is simple enough to explain it.

In addition, most (seem with single exception) hash function designers consider it unacceptable to intentionally leave a such known flaws (especially when ones are obviously).

@wangyi-fudan
Copy link
Contributor

I updated wyhash readme to state this limitation.

@rurban
Copy link
Owner

rurban commented Feb 6, 2020

@leo-yuriev What should be the proper abbreviation for this flaw for the rhs comments? blind mult?

@erthink
Copy link
Contributor Author

erthink commented Feb 6, 2020

blind mult

Nothing better comes to mind

@wangyi-fudan
Copy link
Contributor

well, I suggest that any flaw statement should come with a testable code and apply it on every hash function. We can not use mouth/words only.

@wangyi-fudan
Copy link
Contributor

This problem is fully solved:
wangyi-fudan/wyhash#49

@rurban
Copy link
Owner

rurban commented Feb 12, 2020

This problem is fully solved:
wangyi-fudan/wyhash#49

Only in wyhash, but for about half of the others hashes not.

@erthink
Copy link
Contributor Author

erthink commented Feb 15, 2020

@rurban, I have a few ideas for such a test and an intention to do it.
So you can assign this issue to me.
however, I can't make any promises, since too busy most of the time (

@avaneev
Copy link
Contributor

avaneev commented Aug 27, 2020

As for the PRVHASH, this isn't a problem if InitVec specification is followed: it should be a uniformly-random sequence of bits, preferrably composed of 16-bit words with 4-12 bits set. It is impossible to obtain a 128-bit zero. SeedXOR on the other hand, can be any value - even if it is equal to the default Seed, the system won't break.

@rurban
Copy link
Owner

rurban commented Mar 29, 2021

I would at least give devs the possibilty to skip a list of bad seeds, they are getting from an initial rand. And list the hashes which are vulnerable to bad seeds (with a very low percentage indeed, 1-2 of 2^32 or 2^64).

This would not affect the hash performance, only the init call needs to reject a bad seed. Write more such seed_init() calls.

eg clhash skips its bad seeds in get_random_key_for_clhash(), but fails to check if the two seed arguments are non-zero, initializing xorshift128plus_init with bad values.

rurban added a commit that referenced this issue Mar 29, 2021
started with for wyhash and clhash.
See GH #99
rurban added a commit that referenced this issue Mar 29, 2021
for now just exhaustive, against some simple 16byte keys.
need to check the bad_seeds lists also, as these tests will
actually terminate

See GH #99
@rurban
Copy link
Owner

rurban commented Mar 29, 2021

I called the new test BadSeeds, and will add the list of bad seeds to the static HashInfo list in main.cpp.

wyhash32 has eg {0x1bc1d52e, 0x1cbc261d, 0x33a0d1d9, 0x429dacdd, 0xd637dbf3}
wyhash64 has { 0xa0761d6478bd642f, 0xe7037ed1a0b428db }
clhash { 0 } as many others

waiting for confirmation will need a few days of CPU

@avaneev
Copy link
Contributor

avaneev commented Mar 29, 2021

Just a note, latest PRVHASH has no "bad" seeds at all... prvhash64s even starts at "full zero" state.

rurban added a commit that referenced this issue Mar 29, 2021
started with for wyhash and clhash.
See GH #99
rurban added a commit that referenced this issue Mar 29, 2021
for now just exhaustive, against some simple 16byte keys.
need to check the bad_seeds lists also, as these tests will
actually terminate

See GH #99
@rurban
Copy link
Owner

rurban commented Mar 30, 2021

I've added now all known bad seeds to main.cpp and README.md
and added https://rurban.github.io/smhasher/doc/badseeds.log.txt
Just some <hash>_bad_seeds() lists are missing.

Special care needs to be taken for crc, FNV1 variants, fletcher, Jenkins, and with GOOD hashes all MUM variants, like mirhash, MUM, wyhash32.

@rurban rurban closed this as completed Mar 30, 2021
@wangyi-fudan
Copy link
Contributor

Now, wyhash has removed the default secret. Thus it is hard to break.

@tommyettinger
Copy link

I've been trying to figure out how to properly use the BadSeeds test in my fork of smhasher (it includes so much experimental junk that there's no point in cleaning it up for PRs, sorry about that). I have a hash that's very similar to a reduced-bit-size version of wyhash (WaterHash) and another that's a little different, but still somewhat mum-type (WootHash). I'm pretty sure the first of these has bad seeds, but I'm not sure how to send billions of possible seeds (like all with 0 in the lower 32 bits) to BadSeeds to check. The second of these might not qualify as "blind" multiplication, so it could be resistant or immune to this issue -- or it might not, so it would be great to be able to scan for problematic seeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants