Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Categorical.take and Series([Categorical]).take is inconsistent with other dtypes #20664

Closed
jorisvandenbossche opened this issue Apr 12, 2018 · 5 comments · Fixed by #20841
Labels
Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Related to #20582 and #20640

Categorical.take does not do any bounds checking, so produces wrong results:

In [9]: cat = pd.Categorical(['a', 'b', 'a'])

In [12]: cat.take([3])
Out[12]: 
[0]                       # <--------- non-sensical result (produces invalid Categorical), should error instead
Categories (2, object): [a, b]

In [10]: cat.take([-1])
Out[10]: 
[NaN]                      # <--------- this is correct
Categories (2, object): [a, b]

In [15]: cat[:0].take([0])
Out[15]: 
[a]                             # <--------- should also error
Categories (2, object): [a, b]

Series.take with categorical data does not follow the numpy behaviour of "-1" (starting from end):

In [22]: s = pd.Series(cat)

In [24]: s.take([-1])
Out[24]: 
2    NaN            # <--------- wrongly NaN, because it dispatches to `Categorical.take` which returns this
dtype: category
Categories (2, object): [a, b]

In [25]: s.astype(object).take([-1])
astype
Out[25]: 
2    a
dtype: object
@jorisvandenbossche jorisvandenbossche added Bug Categorical Categorical Data Type labels Apr 12, 2018
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Apr 12, 2018
@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Apr 12, 2018
@jorisvandenbossche
Copy link
Member Author

(at least the Series.take part is a regression from 0.20)

@TomAugspurger
Copy link
Contributor

FYI, I'm working on this. Will have a PR tonight hopefully.

@TomAugspurger
Copy link
Contributor

Ok, will make a PR once #20814 is merged. One point to clarify, we only want to warn when the user passes negative values, correct?

cat = pd.Categorical(['a', 'b'])
cat.take([0, -1])  # FutureWarning
cat.take([0, 1])  # no warning

@jorisvandenbossche
Copy link
Member Author

One point to clarify, we only want to warn when the user passes negative values, correct?

Yes. (I don't think there is any change in behaviour for the other case?)

@jorisvandenbossche
Copy link
Member Author

Personally, I would even consider breaking it without deprecation (would people really use this?), but either is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Regression Functionality that used to work in a prior pandas version
Projects
None yet
2 participants