Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add MultiIndex.to_dataframe #15216

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Jan 24, 2017

ENH: allow hashing of MultiIndex

closes #12397

@jreback
Copy link
Contributor Author

jreback commented Jan 24, 2017

cc @mrocklin
cc @jcrist

also adds hashing of MultiIndex.

@jreback
Copy link
Contributor Author

jreback commented Jan 24, 2017

In [52]: i = pd.MultiIndex.from_tuples([(118, 472), (236, 118), (51, 204), (102, 51)])

In [53]: i
Out[53]: 
MultiIndex(levels=[[51, 102, 118, 236], [51, 118, 204, 472]],
           labels=[[2, 3, 0, 1], [3, 1, 2, 0]])

In [55]: i.to_dataframe(index=False)
Out[55]: 
     0    1
0  118  472
1  236  118
2   51  204
3  102   51

In [56]: from pandas.tools.hashing import hash_pandas_object

In [57]: hash_pandas_object(i.to_dataframe(index=False), index=False)
Out[57]: 
0    11950414010286087598
1    11950414010286087598
2    10472907816967777234
3    10472907816967777234
dtype: uint64

In [58]: hash_pandas_object(i.to_dataframe(index=False), index=True)
Out[58]: 
0    17404497957148711178
1     5195826631379738351
2    10365020365066803200
3    15157173997208611942
dtype: uint64

odd that [57] can produce duplicates. any thoughts @mikegraham
(in practice I don't think this matters as we normally also include the index, which then makes these unique)

but I have a case where I just want to uniquely hash values (no index)

@codecov-io
Copy link

codecov-io commented Jan 25, 2017

Current coverage is 86.30% (diff: 100%)

No coverage report found for master at ba05744.

Powered by Codecov. Last update ba05744...4a151c6

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

@jorisvandenbossche ok with this? (I need to build on top of this for other things).....

@jorisvandenbossche
Copy link
Member

Yes, looks good. Only thing I am wondering is the name. We already have Series.to_frame as well, so it would be nice to be consistent here (although I think I like to_dataframe more ..).

Also wondering if this should be restricted to MultiIndex and not just general for Index (but that can certainly go in a follow-up PR if we want that)

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

sure will change .to_frame()

not sure if we should add this to Series, it doesn't really make senses there :>

@jorisvandenbossche
Copy link
Member

not sure if we should add this to Series, it doesn't really make senses there :>

I wrote Index, not Series (but maybe that's what you meant :-)). Index already has a to_series method, which is more logical for an Index, but the main motivation would be to just not create more distinction in api between single/multi index (eg to not always have to check the number of levels of your index when writing generic code, cfr #3268)

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

@jorisvandenbossche yeah that is a fair point, we can revisit, i'll create an issue.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#12397

Author: Jeff Reback <jeff@reback.net>

Closes pandas-dev#15216 from jreback/to_dataframe and squashes the following commits:

b744fb5 [Jeff Reback] ENH: add MultiIndex.to_dataframe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add to_dataframe() method to MultiIndex
3 participants