Skip to content

Commit

Permalink
Merge pull request #3017 from y-p/GH3011
Browse files Browse the repository at this point in the history
segmentation fault on groupby with categorical grouper of mismatched len
  • Loading branch information
y-p committed Mar 16, 2013
2 parents 32ad737 + 69b6d60 commit 6e7b37b
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 0 deletions.
4 changes: 4 additions & 0 deletions doc/source/v0.11.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,9 @@ Bug Fixes
- Fix pretty-printing of infinite data structures (closes GH2978_)
- Fixed exception when plotting timeseries bearing a timezone (closes GH2877_)
- str.contains ignored na argument (GH2806_)
- Substitute warning for segfault when grouping with categorical grouper
of mismatched length (GH3011_)


See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
Expand All @@ -337,3 +340,4 @@ on GitHub for a complete list.
.. _GH2806: https://github.com/pydata/pandas/issues/2806
.. _GH2807: https://github.com/pydata/pandas/issues/2807
.. _GH2918: https://github.com/pydata/pandas/issues/2918
.. _GH3011: https://github.com/pydata/pandas/issues/3011
5 changes: 5 additions & 0 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1310,6 +1310,11 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True):
exclusions.append(gpr)
name = gpr
gpr = obj[gpr]

if (isinstance(gpr,Categorical) and len(gpr) != len(obj)):
errmsg = "Categorical grouper must have len(grouper) == len(data)"
raise AssertionError(errmsg)

ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
groupings.append(ping)

Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2237,6 +2237,15 @@ def test_groupby_first_datetime64(self):
got_dt = result.dtype
self.assert_(issubclass(got_dt.type, np.datetime64))

def test_groupby_categorical_unequal_len(self):
import pandas as pd
#GH3011
series = Series([np.nan, np.nan, 1, 1, 2, 2, 3, 3, 4, 4])
bins = pd.cut(series.dropna(), 4)

# len(bins) != len(series) here
self.assertRaises(AssertionError,lambda : series.groupby(bins).mean())

def assert_fp_equal(a, b):
assert((np.abs(a - b) < 1e-12).all())

Expand Down

0 comments on commit 6e7b37b

Please sign in to comment.