lose entropy #49

gzm55 · 2020-02-05T14:08:07Z

it should be noted that wyhash can lose entropy (similarly as described below). For instance, when data could be correlated with the seed ^ _wypN values or equal to it.

@leo-yuriev described this issue on the README of t1ha hash.

The text was updated successfully, but these errors were encountered:

wangyi-fudan · 2020-02-06T00:16:20Z

I don't think so. because we keep the mask as secret. thus it can lose entropy with 2^-64 chance.

erthink · 2020-02-06T07:58:25Z

The actual chance is 2^-63 per each multiplication. Moreover, the later a zero rolls, (then) the greater the losses will be. For example, if a zero rolls at the end of the main loop, then the 1/4 of the data will be lost.

For many applications, this is an unacceptable flaw. Therefore, users should know about the price of speed, in order to make their own conclusions and assess the risks (regardless of how you assess the severity of the problem).

wangyi-fudan · 2020-02-06T08:20:04Z

Thanks!
I also warn that DRAM is dangerous: http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf

erthink · 2020-02-06T08:43:02Z

It is clear that you can't fix the flaw without significantly slowing down wyhash. This is not a problem, wyhash is still a good and fast for many applications. However, you should not justify or gloss over this flaw.

On the contrary, this should be well documented so that users understand exactly what is happening and can choose the speed provided by the wyhash with awareness.

wangyi-fudan · 2020-02-06T08:45:50Z

OK, my friend!

erthink · 2020-02-06T10:36:46Z

@wangyi-fudan, I don't want to find fault, but still the current wording is not correct. Since with an increase in the length of the hashed data, the probability of losing something tends to 100%.

I think it is better to formulate "at least 2 ^ -63 per 64-bit word", give an exact analytical formula (i.e. 1.0 - ((1.0 - (2 ^ -63)) ^ N)) and specify the probability for 1,000,000 bytes (for instance).

$ echo "scale = 32; 1.0 - ((1.0 - (2.0 ^ -63)) ^ 125000)" | bc
.00000000000001355252715606865817

This value is small enough to be shy of it ;)

erthink · 2020-02-06T10:41:56Z

Minor issue: the "FastestHash" it not a hash, but a do-nothing stub.

wangyi-fudan · 2020-02-06T10:46:22Z

The FastestHash does things, providing useful hashes for short strings in hash table.
I will update the number.
cheers!
We are in deadly flu season, so I imaging if one day I will die, at least wyhash is there.

wangyi-fudan · 2020-02-06T10:56:01Z

The scale=32 confused me ;-).
it is 2^-66 per byte, so it will be 2^-26 per TB(2^40), approximately: one loss per 64 million TB.

gzm55 · 2020-02-11T14:20:10Z

PR #52 should improve this issue with very few runtime cost.

wangyi-fudan · 2020-02-11T23:40:16Z

Dear All:
The issue is fully solved. based on @gzm55 's idea, we can use the following mix functions:
_wymix(A,B)=_wymum(A,B)^A^B.
If A==0 and B!=0, it will return B.
The cost is not bad.

HashFunction Words Hashmap Bulk64K Bulk16M
FastestHash 337.38 51.33 14233.53 3435973.84
std::hash 96.53 36.12 7.33 7.35
wyhash 249.93 44.28 21.61 19.28
xxHash64 109.42 35.19 14.71 14.49
XXH3_scalar 181.24 42.13 13.12 13.07
t1ha2_atonce 124.07 35.80 17.10 16.51

Cheers!

rurban · 2020-02-12T08:38:54Z

The FastestHash does things, providing useful hashes for short strings in hash table.
I will update the number.
cheers!
We are in deadly flu season, so I imaging if one day I will die, at least wyhash is there.

It's not a deadly flu, just a very bad cold. Much less deadly than SARS or MERS, not talking about a really dangerous flu, like the Spanish flu. It's over in May with warm temperatures coming, hold on.

erthink mentioned this issue Feb 6, 2020

Test for "blinding multiplication" rurban/smhasher#99

Closed

wangyi-fudan closed this as completed Feb 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lose entropy #49

lose entropy #49

gzm55 commented Feb 5, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020 •

edited

Loading

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020 •

edited

Loading

gzm55 commented Feb 11, 2020

wangyi-fudan commented Feb 11, 2020

rurban commented Feb 12, 2020

lose entropy #49

lose entropy #49

Comments

gzm55 commented Feb 5, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

erthink commented Feb 6, 2020 • edited Loading

erthink commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020

wangyi-fudan commented Feb 6, 2020 • edited Loading

gzm55 commented Feb 11, 2020

wangyi-fudan commented Feb 11, 2020

rurban commented Feb 12, 2020

erthink commented Feb 6, 2020 •

edited

Loading

wangyi-fudan commented Feb 6, 2020 •

edited

Loading