Pure Python GroupBy bug #618

yarikoptic · 2012-01-12T17:28:30Z

I have tried to find related issue but failed... so pardon me if it is a duplicate:

ATM if groupping doesn't result in actually all possible combinations, the pandas spits out non-informative

/home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/groupby.py in _aggregate_series_pure_python(self, obj, func, ngroups)
    431                     raise ValueError('function does not reduce')
    432 
--> 433             counts[label] = group.shape[0]
    434             result[label] = res
    435 

IndexError: index out of bounds
> /home/yoh/deb/gits/pkg-exppsy/pandas/pandas/core/groupby.py(433)_aggregate_series_pure_python()
    432 
--> 433             counts[label] = group.shape[0]
    434             result[label] = res

imho there could be an option to still handle those but place NaNs for those entries, OR at least spit out an informative exception something like "combination f1='x', f2='y' doesn't have data entries in the original data, or smth like that

The text was updated successfully, but these errors were encountered:

wesm · 2012-01-12T17:36:50Z

could you give me a self-contained test case?

related to #443

yarikoptic · 2012-01-12T17:52:06Z

pity, but I fail to come up with a minimal example -- indeed it just inserts NAs for those, so may be it is a different scenario... I will keep it in mind - may be I would come up with one eventually ;)

yarikoptic · 2012-01-17T16:57:04Z

ok -- here is a non-minimalistic example. seems to boil down to me somewhat abusing index (I have 'subject' column which is also used as a part of MultiIndex for rows). But here is a sample data (just gunzip it):
http://www.onerussian.com/tmp/data4wes.hdf5.gz and this is a snippet to demonstrate the problem:

from pandas import *
store_ = HDFStore('/tmp/data4wes.hdf5')
pivot_table(store_['d'], 'RT', rows=['subject'], cols=['condition', 'pgender', 'gaze'], margins=True)

wesm · 2012-01-17T18:45:57Z

@yarikoptic there are actually a couple of bugs here. Note this works fine:


In [8]: d = store['d']

In [9]: d.groupby(['condition', 'pgender', 'gaze', 'subject'])['RT'].mean()
Out[9]: 
condition   pgender  gaze  subject  
full_ag     f        a     01jul10sc    2.312
full_ag     m        a     01jul10sc    2.507
full_dg     f        d     01jul10sc    1.905
full_dg     m        d     01jul10sc    2.137
profile_ag  f        a     01jul10sc    1.698
profile_ag  m        a     01jul10sc    2.408
profile_dg  f        d     01jul10sc    2.481
profile_dg  m        d     01jul10sc    2.912

but this does not:


In [10]: d.groupby(['condition', 'pgender', 'gaze', 'subject'])['RT'].agg(np.mean)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/home/wesm/Downloads/<ipython-input-10-976e1dfbc45e> in <module>()
----> 1 d.groupby(['condition', 'pgender', 'gaze', 'subject'])['RT'].agg(np.mean)

/home/wesm/code/pandas/pandas/core/groupby.pyc in agg(self, func, *args, **kwargs)
    283         See docstring for aggregate
    284         """
--> 285         return self.aggregate(func, *args, **kwargs)
    286 
    287     def _iterate_slices(self):

/home/wesm/code/pandas/pandas/core/groupby.pyc in aggregate(self, func_or_funcs, *args, **kwargs)
    779         else:
    780             if len(self.groupings) > 1:
--> 781                 return self._python_agg_general(func_or_funcs, *args, **kwargs)
    782 
    783             try:

/home/wesm/code/pandas/pandas/core/groupby.pyc in _python_agg_general(self, func, *args, **kwargs)
    394             try:
    395                 result, counts = self._aggregate_series(obj, agg_func,
--> 396                                                         comp_ids, max_group)
    397                 output[name] = result                                                                                                                                                                      
    398             except TypeError:

/home/wesm/code/pandas/pandas/core/groupby.pyc in _aggregate_series(self, obj, func, group_index, ngroups)
    412             return _aggregate_series_fast(obj, func, group_index, ngroups)
    413         except Exception:
--> 414             return self._aggregate_series_pure_python(obj, func, ngroups)
    415 
    416     def _aggregate_series_pure_python(self, obj, func, ngroups):

/home/wesm/code/pandas/pandas/core/groupby.pyc in _aggregate_series_pure_python(self, obj, func, ngroups)
    432                     raise ValueError('function does not reduce')
    433 
--> 434             counts[label] = group.shape[0]
    435             result[label] = res
    436 

IndexError: index out of bounds

thanks for reproducing! This is a blocker for 0.7.0 so I will fix asap...

yarikoptic · 2012-01-17T19:21:36Z

@yarikoptic there are actually a couple of bugs here. Note this works fine:
...
thanks for reproducing!

Glad to be of "help" ;-)

This is a blocker for 0.7.0 so I will fix asap...

Cool -- thanks in advance

=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic

…ids re: GH #618

…, GH #618

wesm · 2012-01-17T20:34:00Z

Alright, this is all set and fixed in master

wesm added a commit that referenced this issue Jan 17, 2012

BUG: fix multi-iter bug and pure Python aggregation given compressed …

81ba24d

…ids re: GH #618

wesm added a commit that referenced this issue Jan 17, 2012

BUG: passing a string to aggfunc like 'mean' won't break _add_margins…

62d12c4

…, GH #618

wesm closed this as completed Jan 17, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure Python GroupBy bug #618

Pure Python GroupBy bug #618

yarikoptic commented Jan 12, 2012

wesm commented Jan 12, 2012

yarikoptic commented Jan 12, 2012

yarikoptic commented Jan 17, 2012

wesm commented Jan 17, 2012

yarikoptic commented Jan 17, 2012

wesm commented Jan 17, 2012

Pure Python GroupBy bug #618

Pure Python GroupBy bug #618

Comments

yarikoptic commented Jan 12, 2012

wesm commented Jan 12, 2012

yarikoptic commented Jan 12, 2012

yarikoptic commented Jan 17, 2012

wesm commented Jan 17, 2012

yarikoptic commented Jan 17, 2012

wesm commented Jan 17, 2012