Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An apply on a DataFrame along axis=1 breaks aggregations on groupby's of that DataFrame #3480

Closed
gkoller opened this issue Apr 29, 2013 · 4 comments · Fixed by #3502
Closed
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@gkoller
Copy link

gkoller commented Apr 29, 2013

This seems to be a regression from 0.10.1 to 0.11.0

This works:

from pandas import DataFrame
df = DataFrame({'foo1' : ['one', 'two', 'two', 'three', 'one', 'two'],
                'foo2' : np.random.randn(6)})
grouped = df.groupby('foo1')
grouped.agg(['mean'])

This does not (notice the apply):

from pandas import DataFrame
df = DataFrame({'foo1' : ['one', 'two', 'two', 'three', 'one', 'two'],
                'foo2' : np.random.randn(6)})
df = df.apply(lambda x: x, axis=1)
grouped = df.groupby('foo1')
grouped.agg(['mean'])

The error raised is:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-47-4df234603be2> in <module>()
----> 1 grouped.agg(['mean'])

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    336     @Appender(_agg_doc)
    337     def agg(self, func, *args, **kwargs):
--> 338         return self.aggregate(func, *args, **kwargs)
    339 
    340     def _iterate_slices(self):

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   1688                 result = DataFrame(result)
   1689         elif isinstance(arg, list):
-> 1690             return self._aggregate_multiple_funcs(arg)
   1691         else:
   1692             cyfunc = _intercept_cython(arg)

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/core/groupby.pyc in _aggregate_multiple_funcs(self, arg)
   1736             except SpecificationError:
   1737                 raise
-> 1738         result = concat(results, keys=keys, axis=1)
   1739 
   1740         return result

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/tools/merge.pyc in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity)
    870                        ignore_index=ignore_index, join=join,
    871                        keys=keys, levels=levels, names=names,
--> 872                        verify_integrity=verify_integrity)
    873     return op.get_result()
    874 

/Users/gkoller/.virtualenvs/abo/lib/python2.6/site-packages/pandas/tools/merge.pyc in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity)
    911 
    912         if len(objs) == 0:
--> 913             raise Exception('All objects passed were None')
    914 
    915         # consolidate data

Exception: All objects passed were None
@jreback
Copy link
Contributor

jreback commented Apr 29, 2013

this doesn't work because:

In [8]: df.dtypes
Out[8]: 
foo1     object
foo2    float64
dtype: object

In [9]: df.apply(lambda x: x,axis=1).dtypes
Out[9]: 
foo1    object
foo2    object
dtype: object

I you think about it, your are aggregating across the columns which are mixed dtypes, so your output will be of dtype object. you can do this to convert back.

In [10]: df.apply(lambda x: x,axis=1).convert_objects()
Out[10]: 
    foo1      foo2
0    one -0.849480
1    two  0.038060
2    two  0.714368
3  three  0.522911
4    one -1.706384
5    two -0.694232

In [11]: df.apply(lambda x: x,axis=1).convert_objects().dtypes
Out[11]: 
foo1     object
foo2    float64
dtype: object

I am not sure this is a bug. We try to use the original dtypes after an apply, but it is not always possible.

@gkoller
Copy link
Author

gkoller commented May 1, 2013

Thank you for the clear explanation. Given this explanation I'm not sure either this is a bug; it seems logical and predictable.

However it did work in version 0.10.1 (I just verified it once more). So I presume version 0.10.1 did the conversion automatically. The change in behavior in 0.11.0 broke existing code. If it isn't a regression it should arguably be documented as a backwards incompatible change.

@jreback
Copy link
Contributor

jreback commented May 1, 2013

on 2nd thought I think this is easily fixible......

@jreback
Copy link
Contributor

jreback commented May 1, 2013

this turned out to be trivial....fixed in #3502, thanks for the catch

@jreback jreback closed this as completed May 1, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants