Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.isin fails or categoricals #16858

Merged
merged 4 commits into from
Jul 11, 2017
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ Numeric

Categorical
^^^^^^^^^^^

- Bug in ``Series.isin()`` when called for categoricals (:issue`16639`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :func:Series.isin()

when called with a categorical


Other
^^^^^
Expand Down
25 changes: 13 additions & 12 deletions pandas/core/algorithms.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@
intended for public consumption
"""
from __future__ import division

from warnings import warn, catch_warnings
import numpy as np

from pandas import compat, _np_version_under1p8
from pandas.compat import string_types
from pandas.compat.numpy import _np_version_under1p10
from pandas.core import common as com

import numpy as np
from pandas._libs import algos, lib, hashtable as htable
from pandas._libs.tslib import iNaT
from pandas.core.dtypes.cast import maybe_promote
from pandas.core.dtypes.generic import (
ABCSeries, ABCIndex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you moved the imports around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOne, that was accidental, possibly due to Pydev or some other fluke, Ive tried to fix to baseline

ABCIndexClass, ABCCategorical)
from pandas.core.dtypes.common import (
is_unsigned_integer_dtype, is_signed_integer_dtype,
is_integer_dtype, is_complex_dtype,
Expand All @@ -26,19 +30,15 @@
_ensure_platform_int, _ensure_object,
_ensure_float64, _ensure_uint64,
_ensure_int64)
from pandas.compat.numpy import _np_version_under1p10
from pandas.core.dtypes.generic import (
ABCSeries, ABCIndex,
ABCIndexClass, ABCCategorical)
from pandas.core.dtypes.missing import isnull

from pandas.core import common as com
from pandas.compat import string_types
from pandas._libs import algos, lib, hashtable as htable
from pandas._libs.tslib import iNaT


# --------------- #
# dtype access #
# --------------- #

def _ensure_data(values, dtype=None):
"""
routine to ensure that our data is of the correct
Expand Down Expand Up @@ -113,7 +113,8 @@ def _ensure_data(values, dtype=None):

return values.asi8, dtype, 'int64'

elif is_categorical_dtype(values) or is_categorical_dtype(dtype):
elif (is_categorical_dtype(values) and
+ (is_categorical_dtype(dtype) or dtype is None)):
values = getattr(values, 'values', values)
values = values.codes
dtype = 'category'
Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,17 @@ def f():
if hasattr(np.random, "choice"):
codes = np.random.choice([0, 1], 5, p=[0.9, 0.1])
pd.Categorical.from_codes(codes, categories=["train", "test"])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a new test, move it to test_algos.py (near the other isin tests)

# Regression test https://github.com/pandas-dev/pandas/issues/16639
vals = np.array([0, 1, 2, 0]);
cats = ['a', 'b', 'c'];

D = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(vals, cats))});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have some linting errors, these lines are too long

T = pd.DataFrame({'id': pd.Series(pd.Categorical(1).from_codes(np.array([0, 1]), cats))});

select_ids = D['id'].isin(T['id']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=
the expected= is below and use
tm.assert_numpy_array_equal


assert( np.all(select_ids == np.array([True, True, False, True]) ) )

def test_validate_ordered(self):
# see gh-14058
Expand Down