Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

Closed
stevengj opened this issue Jun 13, 2013 · 5 comments
Closed

hash(big(pi)) == hash(float64(pi)) && big(pi) != float64(pi). #3385

stevengj opened this issue Jun 13, 2013 · 5 comments
Labels
breaking This change will break code bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed
Milestone

Comments

@stevengj
Copy link
Member

The problem seems to be that hash(x::FloatingPoint) calls hash64(float64(x)), implicitly assuming that Float64 is the widest possible floating-point type.

A lot of the mess here seems to stem from an attempt to make x == y imply hash(x) == hash(y) even if x and y are arbitrarily different types. This seems unworkable to me in the long run, as more numeric types are added. Especially user-defined types. However, the converse should certainly be true: x != y should imply hash(x) != hash(y) (except in the extremely unlikely event of a hash collision).

@JeffBezanson
Copy link
Member

Currently we have isequal behaving like == for non-NaN numbers, but we have often thought of changing this, so that for example isequal(2, 2.0) is no longer true. This would probably be a good thing. Then isequal would be the same as ===, except treating all values as immutable. That seems like a nice simple behavior to understand.

@StefanKarpinski
Copy link
Member

We should probably have a discussion on the dev list about whether it's acceptable to have 2 and 2.0 hash differently or not.

@JeffBezanson
Copy link
Member

The behavior of isequal should be isequal(x,y) implies isequal(f(x),f(y)) for any pure function f (no mutation or side effects).

f must also be free of flagrant abstraction violations, for example computing with the value of pointer(array).

Type dispatch is routine, so we have to assume f might be different on 2 and 2.0.

There may be cases where implementers need to assume f doesn't access object internals, such as metadata fields that are not really part of an object's value. That is ok.

There are a couple cases where it is not clear what f is allowed to do. One such case is examining the sign bit of a NaN. The sign bit is not meaningful, but you might do signbit(nanval) or copysign(x,nanval).

@JeffBezanson
Copy link
Member

Stefan and I agreed to go ahead with this. One way to look at it is that 1.0f0 == 1.0, but generally f(1.0f0) will be computed with less precision, so no memoizer would want to return the same result for both. Also, the existence of typed Dicts helps with this --- if you want everything hashed as Float64, you can use a Dict{Float64,T}.

Strings in different encodings are superficially similar to numbers of different types, but actually quite different since the different encodings are isomorphic --- strings in different encodings can generate the same sequence of egal characters and thus can be indistinguishable to most programs.

@StefanKarpinski
Copy link
Member

In particular, it would be pathological to produce unequal results for equal string inputs with different encodings whereas producing unequal results for equal float inputs with different precisions is typical.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code bug Indicates an unexpected problem or unintended behavior needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

3 participants