Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: inplace dtype-changing methods on Categorical #37643

Closed
6 tasks done
jbrockmendel opened this issue Nov 5, 2020 · 15 comments · Fixed by #49321
Closed
6 tasks done

DEPR: inplace dtype-changing methods on Categorical #37643

jbrockmendel opened this issue Nov 5, 2020 · 15 comments · Fixed by #49321
Assignees
Labels
Bug Categorical Categorical Data Type Deprecate Functionality to remove in pandas

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Nov 5, 2020

The ability to alter an existing array's dtype is a footgun. We can return a new object without having to make a copy.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member Categorical Categorical Data Type Deprecate Functionality to remove in pandas labels Nov 5, 2020
@jorisvandenbossche
Copy link
Member

Also set_categories.

All those methods have a inplace=False keyword, so by default they don't actually change the existing array's dtype. And with inplace=True this can be expected? (it's a bit similar issue as with inplace deprecation discussion, but also here a copy=True/False keyword might be more appropriate ...)

@AlexKirko
Copy link
Member

Hard to think of a use case where changing a Categorical dtype in place would be intended, but easy to think of use cases where it would lead to unexpected errors. Probably best to deprecate this and raise.

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Nov 6, 2020
@OlehKSS
Copy link
Contributor

OlehKSS commented Nov 17, 2020

Hi, I've created a pull request to solve this issue for Categorical.remove_unused_categories. Could you check it?
For the other methods, should I create separate pull requests for each one of them so as to keep them small?

@OlehKSS
Copy link
Contributor

OlehKSS commented Nov 18, 2020

take

@jorisvandenbossche
Copy link
Member

I am not sure we should deprecate the inplace ability for all of those. For example, for set_categories, or rename_categories, this operation can actually happen inplace, and can thus be beneficial.

@TomAugspurger
Copy link
Contributor

Agreed. It can be expensive to copy all the data (codes) when not necessary.

@jbrockmendel
Copy link
Member Author

Agreed. It can be expensive to copy all the data (codes) when not necessary.

i was thinking we'd get a view on the codes, just a new Categorical object. This should be cheap.

@OlehKSS
Copy link
Contributor

OlehKSS commented Nov 20, 2020

I've added another pull request to deprecate inplace in remove_categories.
As far as I understand, a deep copy of the underlying ndarray was created for inplace=False in set_categories and rename_categories.
Should I just keep the current behavior of these methods?

@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

@jorisvandenbossche @TomAugspurger the inplace ops here seem like a footgun; sure they may have a small cost (but could be mitgated as @jbrockmendel indicates), but we don't allow anything like this on any other accessor operation. +1 on deprecation.

@OlehKSS
Copy link
Contributor

OlehKSS commented Nov 30, 2020

@jbrockmendel
Should the inplace parameter be deprecated in the Categorical.replace method as well? This method uses Cateogrical.remove_categories internally

@jbrockmendel
Copy link
Member Author

Without looking at the implementation: Categorical.replace should be able to alter the values (i.e. self._codes) inplace, but not able to alter the dtype (i.e. self._categories)

@ivirshup
Copy link
Contributor

ivirshup commented Jan 4, 2021

i was thinking we'd get a view on the codes, just a new Categorical object. This should be cheap.

It would be nice to be able to control this behaviour, maybe with copy. Right now, I'm not sure it's intuitive that inplace=False would return a view on the original data. Also methods like, remove_unused_categories modify the codes, so presumably this couldn't return a view unless the operation was mutating.

As a side note, I like the ability to modify these inplace – though I see that mutating the type (if that's what dtypes are) is strange.

@jbrockmendel
Copy link
Member Author

all the checkboxes are checked; closing

@jbrockmendel
Copy link
Member Author

Re-opening, never addressed the "option to return a view" part of the issue.

@jbrockmendel jbrockmendel reopened this Jan 11, 2022
@jbrockmendel
Copy link
Member Author

I'm having second thoughts about this. We currently allow setting categories using cat.categories = some_list, for which i don't see an obvious non-copy-but-not-inplace alternative.

The category-setting is a pretty convenient pattern, not sure how to trade this off against modifying-dtype-inplace being an antipattern.

@jbrockmendel jbrockmendel mentioned this issue Apr 15, 2022
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants