hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

stevengj · 2013-06-13T18:37:45Z

The problem seems to be that hash(x::FloatingPoint) calls hash64(float64(x)), implicitly assuming that Float64 is the widest possible floating-point type.

A lot of the mess here seems to stem from an attempt to make x == y imply hash(x) == hash(y) even if x and y are arbitrarily different types. This seems unworkable to me in the long run, as more numeric types are added. Especially user-defined types. However, the converse should certainly be true: x != y should imply hash(x) != hash(y) (except in the extremely unlikely event of a hash collision).

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2013-06-13T22:05:22Z

Currently we have isequal behaving like == for non-NaN numbers, but we have often thought of changing this, so that for example isequal(2, 2.0) is no longer true. This would probably be a good thing. Then isequal would be the same as ===, except treating all values as immutable. That seems like a nice simple behavior to understand.

StefanKarpinski · 2013-06-13T22:43:21Z

We should probably have a discussion on the dev list about whether it's acceptable to have 2 and 2.0 hash differently or not.

JeffBezanson · 2013-06-30T18:25:46Z

The behavior of isequal should be isequal(x,y) implies isequal(f(x),f(y)) for any pure function f (no mutation or side effects).

f must also be free of flagrant abstraction violations, for example computing with the value of pointer(array).

Type dispatch is routine, so we have to assume f might be different on 2 and 2.0.

There may be cases where implementers need to assume f doesn't access object internals, such as metadata fields that are not really part of an object's value. That is ok.

There are a couple cases where it is not clear what f is allowed to do. One such case is examining the sign bit of a NaN. The sign bit is not meaningful, but you might do signbit(nanval) or copysign(x,nanval).

JeffBezanson · 2013-08-14T18:59:19Z

Stefan and I agreed to go ahead with this. One way to look at it is that 1.0f0 == 1.0, but generally f(1.0f0) will be computed with less precision, so no memoizer would want to return the same result for both. Also, the existence of typed Dicts helps with this --- if you want everything hashed as Float64, you can use a Dict{Float64,T}.

Strings in different encodings are superficially similar to numbers of different types, but actually quite different since the different encodings are isomorphic --- strings in different encodings can generate the same sequence of egal characters and thus can be indistinguishable to most programs.

StefanKarpinski · 2013-08-14T23:27:21Z

In particular, it would be pathological to produce unequal results for equal string inputs with different encodings whereas producing unequal results for equal float inputs with different precisions is typical.

JeffBezanson mentioned this issue Jul 31, 2013

Release Candidate plans for 0.2 #3827

Closed

JeffBezanson closed this as completed in fc05d74 Aug 23, 2013

This was referenced Feb 12, 2014

hashing of ranges is awful #5778

Closed

Convert key to Dict key type before hashing #4166

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

stevengj commented Jun 13, 2013

JeffBezanson commented Jun 13, 2013

StefanKarpinski commented Jun 13, 2013

JeffBezanson commented Jun 30, 2013

JeffBezanson commented Aug 14, 2013

StefanKarpinski commented Aug 14, 2013

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

Comments

stevengj commented Jun 13, 2013

JeffBezanson commented Jun 13, 2013

StefanKarpinski commented Jun 13, 2013

JeffBezanson commented Jun 30, 2013

JeffBezanson commented Aug 14, 2013

StefanKarpinski commented Aug 14, 2013