Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

ichipper · 2012-07-27T21:03:25Z

Here is the bug to reproduce the bug/unexpected behavior:

from pandas import DataFrame
from pandas import MultiIndex

midx = MultiIndex.from_tuples([('f1', 's1'),('f1','s2'),('f2', 's1'),('f2', 's2'),('f3', 's1'),('f3','s2')])
df = DataFrame([[1,2,3,4,5,6],[7,8,9,10,11,12]], columns= midx)
df1 = df.select(lambda u: u[0] in ['f2', 'f3'], axis=1)
df1_group = df1.groupby(axis=1, level=0)
print df1_group.groups
print df1_group.sum()

When running the code, we can see that df1 is:

   f1          f2         f3    
   s1  s2  s1  s2  s1  s2
0   1   2   3    4    5     6
1   7   8   9   10  11   12

And df1 is selected from subblocks of df:

   f2        f3    
   s1  s2  s1  s2
0   3   4   5   6
1   9  10  11  12

After grouping df1 by the first level of multiindex of the columns,
we can see df1_group.groups is:

{'f2': [('f2', 's1'), ('f2', 's2')], 'f3': [('f3', 's1'), ('f3', 's2')]}

However, when apply a sum function to aggregate the columns inside each group, as in the example code,
df1_group.sum() results in:

   f1       f2  f3
0 NaN   7  11
1 NaN  19  23

It seems it tries to do the aggregation using the columns of df instead of df1 so the columns of the resulting dataframe
include the label 'f1', which doesn't exist in df1.

The text was updated successfully, but these errors were encountered:

wesm closed this as completed in 48a3194 Aug 12, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

ichipper commented Jul 27, 2012

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

Bug/unexpected behaviour when using groupby and aggregation functions with DataFrame #1697

Comments

ichipper commented Jul 27, 2012