-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/DOC: Deprecate and Advise against having np.nan
in Categoricals
#10748
Comments
hmm, we don't have an explicity prohibition in I think if you added a test and checked in validate_categories (maybe exempting |
Unless someone comes up with a good use case of NaNs in the categories, I think we should just disallow it. This will make it a lot easier I think. Because even if we advise against it in the docs, if it is possible, people will do it and we will have to deal with the bug reports. |
np.nan
in Categoricalsnp.nan
in Categoricals
Ok, I've updated the title to recommend deprecation (unless there's a good reason otherwise). Later tonight I'll try out a check disallowing it and see what happens. |
The original reason, AFAIR, was that R handled that corner case (NA as factor and NA as missing -> two different NA in a vector). See here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/factor.html
|
Thanks. I think this is an area where with the pandas way of handling missing data (np.nan for everything) means it makes sense to differ from R. Unless the potential of adding integer NA with dynd means we would want to allow NA in the |
I agree, I don't think there is anything to be gained from copying this R behavior in pandas. |
@TomAugspurger let's just change this. I don't even thing its worth deprecating. |
@TomAugspurger any chance you would have some time for this to get it into 0.17? |
I'll give it a shot today. I think |
yes deprecation warning is fine |
Deprecated in 0.17.0. xref pandas-devgh-10748
Deprecated in 0.17.0. xref pandas-devgh-10748
Deprecated in 0.17.0. xref pandas-devgh-10748
Deprecated in 0.17.0. xref pandas-devgh-10748
Deprecated in 0.17.0. xref pandas-devgh-10748
Deprecated in 0.17.0. xref pandas-devgh-10748 xref pandas-devgh-13648
Deprecated in 0.17.0. xref #10748 xref #13648 Author: Jeff Reback <jeff@reback.net> Author: gfyoung <gfyoung17@gmail.com> Closes #15806 from gfyoung/categories-nan-drop and squashes the following commits: 318175b [Jeff Reback] TST: test pd.NaT with correct dtype 4dce349 [gfyoung] Drop support for NaN categories in Categorical
Deprecated in 0.17.0. xref pandas-dev#10748 xref pandas-dev#13648 Author: Jeff Reback <jeff@reback.net> Author: gfyoung <gfyoung17@gmail.com> Closes pandas-dev#15806 from gfyoung/categories-nan-drop and squashes the following commits: 318175b [Jeff Reback] TST: test pd.NaT with correct dtype 4dce349 [gfyoung] Drop support for NaN categories in Categorical
This came out of work on #10729
In the documentation, we mention that
In the first case,
NaN
is not in.categories
, and in the second case it is. I think we should onlyrecommend the first.
The option of
NaN
s in the categories makes the code in #10729 less pleasant that it would be otherwise. I don't think we should error if NaNs are included, just advise against it in the docs. Perhaps a deprecation, but I worry that I'm missing some obvious reason why NaNs were allowed in.categories
.@JanSchulz do you remember the initial reason for allowing either representation?
Some bad things that come out of
NaN
in.categories
:nan
mapping to a code of-1
:nan
) in the.categories
Index.NaN
in categories.The text was updated successfully, but these errors were encountered: