Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3 hashing #1772

Merged
merged 3 commits into from
Sep 10, 2015
Merged

Python 3 hashing #1772

merged 3 commits into from
Sep 10, 2015

Conversation

QuLogic
Copy link
Member

@QuLogic QuLogic commented Sep 7, 2015

Several classes must be hashable because they are used as dictionary keys. This is generally implemented as a hash of the same information used for __eq__, or if that's not defined, the id.

@QuLogic QuLogic mentioned this pull request Sep 7, 2015
14 tasks
@@ -146,6 +146,9 @@ def __new__(cls, coord, dims):
kwargs)
return metadata

def __hash__(self):
return hash(tuple(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really surprised a namedtuple isn't hashable by default...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are: 😕

$ python2
python 2.7.10 ...
>>> import collections
>>> hash(collections.namedtuple('Foo', 'foo bar')(1, 2))
3713081631934410656

$ python3
Python 3.4.3 ...
>>> import collections
>>> hash(collections.namedtuple('Foo', 'foo bar')(1, 2))
3713081631934410656

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, but when you override __eq__, you must supply __hash__ as well. Unfortunately, you can't do something like __hash__ = namedtuple.__hash__ because namedtuple is actually a function and the real class is hidden. This implementation doesn't exactly match __eq__, but it should match Python 2 behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you override __eq__, you must supply __hash__ as well

Seems to be a runtime check when you try to hash an object...

Python 3.4.3 ...
>>> import collections
>>> Foo = collections.namedtuple('Foo', 'foo bar')
>>> hash(Foo(1, 2))
3713081631934410656
>>> class Bar(Foo): __eq__ = lambda self, other: True
... 
>>> hash(Bar(1, 2))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Bar'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a runtime check when you try to hash an object...

Or I could have just read @QuLogic's original commit message. 😒 Doh!

With new-style objects, if __eq__ is overridden, then __hash__ is automatically undefined.

>>> print(Foo.__hash__)
<slot wrapper '__hash__' of 'tuple' objects>
>>> print(Bar.__hash__)
None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how about: super(_CoordMetaData, self).__hash__()?

@ajdawson
Copy link
Member

ajdawson commented Sep 8, 2015

Also raised in #962.

@@ -210,6 +210,9 @@ def __add__(self, mod):
bound = tuple([val + mod for val in bound])
return Cell(point, bound)

def __hash__(self):
return hash(tuple(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more efficient as super(Cell, self).__hash__()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, the same would apply for the other namedtuple subclasses in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's more efficient, but I think it might be clearer that way. Just running %timeit hash(b) with the class in your example above shows super to be maybe 50 ns slower, but the slowest time is anywhere from 8 to 20-25 times slower for either method. So there's some sort of caching, but for that first hit, it seems to be a toss-up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking 👍

With new-style objects, if __eq__ is overridden, then __hash__ is
automatically undefined.
This is simply based on the id, which is nearly equivalent to Python 2's
method, but perhaps it could be made more general.
rhattersley added a commit that referenced this pull request Sep 10, 2015
@rhattersley rhattersley merged commit f3ccc08 into SciTools:master Sep 10, 2015
@QuLogic QuLogic deleted the py3k-hash branch September 10, 2015 07:24
@QuLogic QuLogic modified the milestone: v1.9 Sep 12, 2015
@pelson
Copy link
Member

pelson commented Sep 14, 2015

👍 - nice solution

@pp-mo
Copy link
Member

pp-mo commented May 17, 2017

Re-investigated : this is still an issue.
It does not seem critical to merge after all, but it does affect other areas especially where we put Coords into sets, e.g. these:

Updated code comments to reflect it #2553.
Notes -- see: #2553 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants