Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Melt with MultiIndex columns #4150

Merged
merged 3 commits into from
Jul 12, 2013
Merged

ENH: Melt with MultiIndex columns #4150

merged 3 commits into from
Jul 12, 2013

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Jul 6, 2013

No idea if there's actually a market for this, but anyhow:

In [1]: df = pd.DataFrame([[ 1.067683, -1.110463,  0.20867 ], [-1.321405,  0.368915, -1.055342], [-0.807333,  0.08298 , -0.873361]])

In [2]: df.columns = [list('ABC'), list('abc')]

In [3]: df.columns.names = ['CAP', 'low']

In [4]: df
Out[4]:
CAP         A         B         C
low         a         b         c
0    1.067683 -1.110463  0.208670
1   -1.321405  0.368915 -1.055342
2   -0.807333  0.082980 -0.873361

In [5]: pd.melt(df, col_level=0)
Out[5]:
  CAP     value
0   A  1.067683
1   A -1.321405
2   A -0.807333
3   B -1.110463
4   B  0.368915
5   B  0.082980
6   C  0.208670
7   C -1.055342
8   C -0.873361

In [6]: pd.melt(df,)
Out[6]:
  CAP low     value
0   A   a  1.067683
1   A   a -1.321405
2   A   a -0.807333
3   B   b -1.110463
4   B   b  0.368915
5   B   b  0.082980
6   C   c  0.208670
7   C   c -1.055342
8   C   c -0.873361

In [7]: df.columns.names = [None, None]

In [8]: pd.melt(df,)
Out[8]:
  variable_0 variable_1     value
0          A          a  1.067683
1          A          a -1.321405
2          A          a -0.807333
3          B          b -1.110463
4          B          b  0.368915
5          B          b  0.082980
6          C          c  0.208670
7          C          c -1.055342
8          C          c -0.873361

Also includes fix for get_level_values and name attribute.

cc #4144

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

tbh I'm not 100% if this is the right behaviour... thoughts?

@cpcloud
Copy link
Member

cpcloud commented Jul 6, 2013

slightly OT: is there a corresponding freeze or unmelt method?

@cpcloud
Copy link
Member

cpcloud commented Jul 6, 2013

using df.set_index might work i guess, but might be useful to have a wrapper around it? not sure

i guess pivot tables are really unmelting...so maybe not that useful

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

@cpcloud docstring says:

"Unpivots" a DataFrame from wide format to long format...

So I guess pivot is the inverse (why it's not just called unpivot I don't know)... weirdly docs create an unpivot function here.

@hayd
Copy link
Contributor Author

hayd commented Jul 6, 2013

No idea at all why it's called melt...

@jreback
Copy link
Contributor

jreback commented Jul 7, 2013

not that I know anything about R, but that's where it comes from......

@hayd
Copy link
Contributor Author

hayd commented Jul 7, 2013

@jreback that's one step back in it's history, would be nice to know where it originated from... maybe I'll ask on SO, no doubt I'll be downvoted into oblivion.

@jtratner
Copy link
Contributor

jtratner commented Jul 9, 2013

@hayd I may be in the minority, but I think melt is a nicely descriptive name, kinda like an unstack that's so intense it melted the whole frame away 👅

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

or 0.12?

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

@jreback What do you reckon to this behaviour though? Does it make sense?

@jtratner but where does the name come from? ;(

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

@hayd i can ask on SO if you want i only have 331 rep so oblivion is a jump, skip and a hop away.

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

@cpcloud Up to you, I had penned a partial question (but I'd need to research more before posting, so may leave for another day)... I'il upvote if you beat me to it :p

I'm thinking it called it because it's the opposite of "cast" (though no idea if there is a difference between cast and pivot) http://www.statmethods.net/management/reshape.html :s

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

@hayd it looks ok to me....I don't really use this stuff... @wesm ?

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

i like how this is in the R docs for cast.

Along with ‘melt’ and recast, this is the only function you should
ever need to use.

Everything in R can be done with exactly 3 functions.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

@hayd I think you're right there's all sorts of talk about "molten" data.frames in the R docs.

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

@cpcloud personally think it's a rather confusing/strange analogy...

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

@hayd Well, you know, R has to keep the barrier to entry very high by using obscure analogies

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

@hayd are you suggesting that we change the name for melt?

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

personally i think melt should stay for back compat. could alias unpivot to it

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

I think melt is pretty descriptive...

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

how are you thinking about it? or do i detect sarcasm over github? i think of it like a "generalized-for-labels" ravel which i think is a much better name..

i actually don't use this much but now i'm starting to see where this might be useful in my own work...

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

I wouldn't be unhappy to see an unpivot alias for melt (or vice versa), but then should cast be an alias for pivot (or pivot_table)? shudders

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

I am -1 on cast I think its very confusing, but +1 on unpivot

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

cast is a horrible name

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

what was R thinking?

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

as confusing as possible, but no more

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

Well written documentation

Basically, you "melt" data so that each row is a unique id-variable combination. 
Then you "cast" the melted data into any shape you would like. Here is a very simple example.

There is much more that you can do with the melt( ) and cast( ) functions. 
See the documentation for more details.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

"shape-casting" should not be called casting, it should be called reshaping (strangely enough, that's the name of the package containing cast) and type casting should be called cast much more consistent with the history of every other programming language.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

okay so then deprecate melt and pivot and move them to frame methods

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

pivot is already there

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

yes..its just pd.pivot that's an issue

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

Will put this together (either as separate pr or on the top of this... or will push on top).

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

Think I should also add a note about pivot vs pivot_table (is melt the inverse of pivot or pivot_table?) Will I sometimes not be able to roundtrip with pivot?

re this question/answer, not clear to me the purpose of pivot... strictness I suppose.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

I believe pivot_table is just a generalization of pivot

(and pivot_table is already an instance method of frame as well)

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

Yeah that's what I think to (I think pivot required every entry to exist, which pivot_table can live with missing data), question I have is does melt always return something which can be pivoted?

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

melt/unpivot effectively drops the all-nan rows I believe? (maybe have an option dropna=True) that could be set

I think the identity is not always true (it COULD be but is not required to be true).....

then again I could be FOS

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

pd.pivot works with numpy arrays so maybe it makes sense there.

melt doesn't drop the NAs:

In [13]: df = pd.DataFrame([[1,2], [3, np.nan]])

In [14]: pd.melt(df)
Out[14]:
   variable  value
0         0      1
1         0      3
2         1      2
3         1    NaN

So it is the inverse of pivot, agree to dropna option (but 0.12 may be optimistic for that bit) in which case pivot_table would be needed to invert it.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

Just to come full circle and be clear I would really like melting of multi-index frames.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

ok...so maybe for 0.12

  • add melt for mi
  • alias unpivot to melt (and deprecate melt)
  • add unpivot as instance method to frame

for 0.13

figure out when roundtripping pviot/unpivot works/makes sense and how pivot_table fits in all this?

@hayd
Copy link
Contributor Author

hayd commented Jul 10, 2013

@cpcloud ok, shall we just merge this one now then? ... I think the suggested behaviour makes sense.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

i can't play with it at this exact moment so if u can wait a couple of hours that would be great otherwise if others think it's ok then merge away!

@wesm
Copy link
Member

wesm commented Jul 10, 2013

I don't see a reason to deprecate melt.

There is indeed quite a bit you can do with reshape2 package that you can't do as easily with pandas. There are also things that aren't easy to do in reshape2 that are easy to pandas. It would be nice to address the shortcomings.

@cpcloud
Copy link
Member

cpcloud commented Jul 10, 2013

@wesm care to elaborate for those of us (like myself) who aren't that familiar with R? Links to examples might be nice as well, then we can open up some issues. Thanks!

@wesm
Copy link
Member

wesm commented Jul 10, 2013

Recommend reading the reshape2 documentation

@jreback
Copy link
Contributor

jreback commented Jul 10, 2013

its just a deprecate on the name melt in favor of unpivot


if isinstance(frame.columns, MultiIndex):
for i, col in enumerate(var_name):
mdata[col] = np.asarray(frame.columns.get_level_values(i)).repeat(N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like you might be able to remove the else below in favor of this, it looks more general, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, just wondering how obvious this is (if there are any edge cases), suppose would be good to allow list for an index... not sure. Will try and have another look tomorrow..

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

@hayd i went ahead and did it for u + some clean up of doctests

travis build:
https://travis-ci.org/cpcloud/pandas/builds/8973767

branch:
cpcloud@92dceb0

@jtratner
Copy link
Contributor

@jreback @cpcloud @hayd I too would prefer if melt were not deprecated. unpivot isn't necessarily more descriptive to me. Also, quick web search suggests that equivalent to R's melt in pandas is a not uncommon question for people new to pandas.

@hayd
Copy link
Contributor Author

hayd commented Jul 11, 2013

@cpcloud appended your commit to this pr

@jtratner I think there is case for unpivot as an alias but melt is here to stay (or melt as an alias for pivot). :)

@cpcloud
Copy link
Member

cpcloud commented Jul 11, 2013

@hayd thanks!

@hayd
Copy link
Contributor Author

hayd commented Jul 11, 2013

@jreback already a note in release, sorry I accidentally deleted your question (past my bed-time)...

@jreback
Copy link
Contributor

jreback commented Jul 11, 2013

@hayd np.....I think looks good...basicaly just adding the mi-melt here, no rename/deprecate....can address in 0.13

@hayd
Copy link
Contributor Author

hayd commented Jul 12, 2013

will make a separate pr (too late for this kind of thing in 0.12 anyways). imo could have melt as is, but unpivot could have additional feature of a dropna argument (i.e. melt is an alias for unpivot(dropna=False). But will put together pr and we can argue there :)

Seems like no qualms about behaviour in the pr, so merge?

@jreback
Copy link
Contributor

jreback commented Jul 12, 2013

merge

hayd added a commit that referenced this pull request Jul 12, 2013
ENH: Melt with MultiIndex columns
@hayd hayd merged commit f4246fb into pandas-dev:master Jul 12, 2013
@cpcloud
Copy link
Member

cpcloud commented Aug 13, 2013

btw molten/melt and cast come from this paper i believe, or maybe from his phd thesis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants