An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

gkoller · 2013-04-29T11:18:59Z

This seems to be a regression from 0.10.1 to 0.11.0

This works:

from pandas import DataFrame
df = DataFrame({'foo1' : ['one', 'two', 'two', 'three', 'one', 'two'],
                'foo2' : np.random.randn(6)})
grouped = df.groupby('foo1')
grouped.agg(['mean'])

This does not (notice the apply):

from pandas import DataFrame
df = DataFrame({'foo1' : ['one', 'two', 'two', 'three', 'one', 'two'],
                'foo2' : np.random.randn(6)})
df = df.apply(lambda x: x, axis=1)
grouped = df.groupby('foo1')
grouped.agg(['mean'])

The error raised is:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-47-4df234603be2> in <module>()
----> 1 grouped.agg(['mean'])

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    336     @Appender(_agg_doc)
    337     def agg(self, func, *args, **kwargs):
--> 338         return self.aggregate(func, *args, **kwargs)
    339 
    340     def _iterate_slices(self):

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1688                 result = DataFrame(result)
   1689         elif isinstance(arg, list):
-> 1690             return self._aggregate_multiple_funcs(arg)
   1691         else:
   1692             cyfunc = _intercept_cython(arg)

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in _aggregate_multiple_funcs(self, arg)
   1736             except SpecificationError:
   1737                 raise
-> 1738         result = concat(results, keys=keys, axis=1)
   1739 
   1740         return result

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity)
    870                        ignore_index=ignore_index, join=join,
    871                        keys=keys, levels=levels, names=names,
--> 872                        verify_integrity=verify_integrity)
    873     return op.get_result()
    874 

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity)
    911 
    912         if len(objs) == 0:
--> 913             raise Exception('All objects passed were None')
    914 
    915         # consolidate data

Exception: All objects passed were None

jreback · 2013-04-29T13:17:02Z

this doesn't work because:

In [8]: df.dtypes
Out[8]: 
foo1     object
foo2    float64
dtype: object

In [9]: df.apply(lambda x: x,axis=1).dtypes
Out[9]: 
foo1    object
foo2    object
dtype: object

I you think about it, your are aggregating across the columns which are mixed dtypes, so your output will be of dtype object. you can do this to convert back.

In [10]: df.apply(lambda x: x,axis=1).convert_objects()
Out[10]: 
    foo1      foo2
0    one -0.849480
1    two  0.038060
2    two  0.714368
3  three  0.522911
4    one -1.706384
5    two -0.694232

In [11]: df.apply(lambda x: x,axis=1).convert_objects().dtypes
Out[11]: 
foo1     object
foo2    float64
dtype: object

I am not sure this is a bug. We try to use the original dtypes after an apply, but it is not always possible.

gkoller · 2013-05-01T05:35:52Z

Thank you for the clear explanation. Given this explanation I'm not sure either this is a bug; it seems logical and predictable.

However it did work in version 0.10.1 (I just verified it once more). So I presume version 0.10.1 did the conversion automatically. The change in behavior in 0.11.0 broke existing code. If it isn't a regression it should arguably be documented as a backwards incompatible change.

jreback · 2013-05-01T09:57:23Z

on 2nd thought I think this is easily fixible......

jreback · 2013-05-01T14:27:20Z

this turned out to be trivial....fixed in #3502, thanks for the catch

jreback mentioned this issue May 1, 2013

BUG: GH3480 Fix regression in a DataFrame apply with axis=1 #3502

Merged

jreback closed this as completed May 1, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

gkoller commented Apr 29, 2013

jreback commented Apr 29, 2013

gkoller commented May 1, 2013

jreback commented May 1, 2013

jreback commented May 1, 2013

An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

Comments

gkoller commented Apr 29, 2013

jreback commented Apr 29, 2013

gkoller commented May 1, 2013

jreback commented May 1, 2013

jreback commented May 1, 2013