Add to_dataframe() method to MultiIndex #12397

relativistic · 2016-02-19T21:44:56Z

I find the Index.to_series method is a convenient way to allow indices to act as columns of a dataframe where desired. However, the behavior of MultiIndex.to_series, which gives a Series of tuples, is less useful.

Would it be convenient to provide a to_dataframe method for index classes? This would be a natural extension of the utility of to_series, and more useful for MultiIndex objects I would think.

I'm something equivalent to:

def to_dataframe(self):
   DataFrame(self.tolist(), columns=self.names, index=self)

The text was updated successfully, but these errors were encountered:

jreback · 2016-02-22T14:13:45Z

can you give an actual usecase?

normally you would simply do .reset_index()

toobaz · 2016-03-04T11:22:16Z

I just searched for this. I have a DF filled with boolean values (an adjacency matrix). I want to store the pairs of labels for which the value is True. I start by doing a simple

df = df.unstack()
df = df[df == True]

now I would like to do

df.to_dataframe().to_csv(path)

instead I need to do

df.reset_index()[['indexname1', 'indexname2']].to_csv(path)

(admittedly, not a huge problem, but on the other hand to_series() itself could easily be replaced by .reset_index()['indexname'] I think; to_dataframe seems to me at least as useful)

relativistic · 2016-03-29T18:22:03Z

Sorry, missed the notification with @jreback 's comment.

One of the nice things about to_series() is that it maintains the original index, which is useful if you need to interact with the original dataframe. reset_index() does not do this. To get what I want with reset_index(), I would need to do something like:

index_names = df.index.names
df_indx = df.reset_index()[index_names]
df_indx.index = df.index

This is a bit verbose, but admittedly, not bad. The main problem is that this is fragile, and will not work on all DataFrames. It would fail if:

any string in index_names has a duplicate in df.columns .
The index is unnamed

Any one of these problems is relatively easy to get around, if you know a priori the construction of the dataframe. However, its a bit more effort if you want to write a general-purpose function that is guarantied to work in all conditions. And even in the case where you have some idea of your input, it would simply be nice if pandas could deal with this for you so you can focus data analysis, rather than how to wrangle it into the right format.

jreback · 2016-03-29T18:29:44Z

@relativistic again a fully worked out example here would be instructive. .reset_index() DOES preserve the index as its now in a column(s). So not sure what you mean.

relativistic · 2016-03-29T18:59:23Z

@jreback : Okay, sure. I'll try to pull together an example some evening soon. Can't show the code I was working on unfortunately, so I'll have to come up with another example.

jreback · 2016-11-30T18:12:14Z

here is an external impl (e.g. could prob be faster / better if we did this inside of a .to_dataframe()) method

In [24]: i = pd.MultiIndex.from_product([np.arange(1000),np.arange(1000)],names=['one','two'])

In [25]: len(i)
Out[25]: 1000000

In [26]: %timeit DataFrame(i.tolist(), columns=i.names)
1 loop, best of 3: 612 ms per loop

In [27]: %timeit DataFrame([], index=i).reset_index()
10 loops, best of 3: 38.5 ms per loop

toobaz · 2016-11-30T18:54:48Z

Just for reference: at the time, I ended up doing:

In [4]: %timeit pd.DataFrame({i.names[n] or n : i.levels[n][i.labels[n]]
                              for n in range(len(i.names))})
100 loops, best of 3: 16.4 ms per loop

which performs slightly better than (same as above, just to compare on the same CPU):

In [5]: %timeit pd.DataFrame([], index=i).reset_index()
10 loops, best of 3: 21.5 ms per loop

jreback · 2016-11-30T19:30:17Z

@toobaz that's only true for a relatively small frame

ENH: allow hashing of MultiIndex closes pandas-dev#12397

closes pandas-dev#12397

closes pandas-dev#12397 Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#15216 from jreback/to_dataframe and squashes the following commits: b744fb5 [Jeff Reback] ENH: add MultiIndex.to_dataframe

jreback added API Design MultiIndex labels Feb 22, 2016

jreback added a commit to jreback/pandas that referenced this issue Jan 24, 2017

ENH: add MultiIndex.to_dataframe

90b8588

ENH: allow hashing of MultiIndex closes pandas-dev#12397

jreback added this to the 0.20.0 milestone Jan 24, 2017

jreback added a commit to jreback/pandas that referenced this issue Jan 24, 2017

ENH: add MultiIndex.to_dataframe

396b9db

ENH: allow hashing of MultiIndex closes pandas-dev#12397

jreback mentioned this issue Jan 24, 2017

ENH: add MultiIndex.to_dataframe #15216

Closed

jreback added a commit to jreback/pandas that referenced this issue Jan 24, 2017

ENH: add MultiIndex.to_dataframe

7193277

ENH: allow hashing of MultiIndex closes pandas-dev#12397

jreback added a commit to jreback/pandas that referenced this issue Jan 25, 2017

ENH: add MultiIndex.to_dataframe

4a151c6

closes pandas-dev#12397

jreback added a commit to jreback/pandas that referenced this issue Jan 25, 2017

ENH: add MultiIndex.to_dataframe

c10cad4

closes pandas-dev#12397

jreback added a commit to jreback/pandas that referenced this issue Jan 25, 2017

ENH: add MultiIndex.to_dataframe

595e5e8

closes pandas-dev#12397

jreback added a commit to jreback/pandas that referenced this issue Jan 25, 2017

ENH: add MultiIndex.to_dataframe

b744fb5

closes pandas-dev#12397

jreback closed this as completed in 7277459 Jan 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add to_dataframe() method to MultiIndex #12397

Add to_dataframe() method to MultiIndex #12397

relativistic commented Feb 19, 2016

jreback commented Feb 22, 2016

toobaz commented Mar 4, 2016

relativistic commented Mar 29, 2016

jreback commented Mar 29, 2016

relativistic commented Mar 29, 2016

jreback commented Nov 30, 2016

toobaz commented Nov 30, 2016

jreback commented Nov 30, 2016

Add to_dataframe() method to MultiIndex #12397

Add to_dataframe() method to MultiIndex #12397

Comments

relativistic commented Feb 19, 2016

jreback commented Feb 22, 2016

toobaz commented Mar 4, 2016

relativistic commented Mar 29, 2016

jreback commented Mar 29, 2016

relativistic commented Mar 29, 2016

jreback commented Nov 30, 2016

toobaz commented Nov 30, 2016

jreback commented Nov 30, 2016