-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: change unique to return Index #13979
Conversation
what about for. multi index? |
@jreback changed to return MI. Adding explicit tests.
|
result = unique1d(values) | ||
|
||
if isinstance(self, ABCCategoricalIndex): | ||
# CategoricalIndex._shallow_copy uses keeps original categories |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think u need special case for CI
it defaults to same categories and ordered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to update categories
because Categorical.unique
reorders categories
if ordered=False
. This can be simplified if Categorical.unique
doesn't change categories
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure anymore why we decided to reorder the categories, but that does not sounds logical to me. @sinhrks do you recall the reasoning?
The values in the index should be in order of appearance, but that does not mean the categories attribute should be changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think we did have some discussion about this, but I don't remember,
cc @JanSchulz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is #10508, and the why seems to be #10505. So @sinhrks needs to answer this :-)
Somehow this looks a lot like a user facing API (unique -> why does it change the categories at all? IMO it should just return all categories even íf some are unused: you also don't change maxint if it isn't in an integer series) was changed to fix a internal user which gave a undesired result (probably groupby shouldn't return empty groups?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder, I'd clean forgotten it.
it was mainly done to keep category
dtype, and not sure whether keeping categories
order breaks anything. let me check...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the actual decision how to sort/not sort the return value of Categorical.unique mattered for the bug fix. But, in retrospect, I think we took the wrong choice there. IMO, categoricals should just follow everything else: unique values in order of appearance, and leave the dtype/categories alone.
d4c1974
to
3839d3b
Compare
Current coverage is 85.27% (diff: 100%)@@ master #13979 diff @@
==========================================
Files 139 139
Lines 50502 50511 +9
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43063 43071 +8
- Misses 7439 7440 +1
Partials 0 0
|
@sinhrks ok as per discussion
|
yep, will update. |
f001cdd
to
8dcf98d
Compare
@@ -1005,6 +1034,7 @@ Bug Fixes | |||
- Bug in ``pd.read_csv()`` with ``engine='c'`` in which fields were not properly cast to float when quoting was specified as non-numeric (:issue:`13411`) | |||
- Bug in ``pd.read_csv``, ``pd.read_table``, ``pd.read_fwf``, ``pd.read_stata`` and ``pd.read_sas`` where files were opened by parsers but not closed if both ``chunksize`` and ``iterator`` were ``None``. (:issue:`13940`) | |||
- Bug in ``StataReader``, ``StataWriter``, ``XportReader`` and ``SAS7BDATReader`` where a file was not properly closed when an error was raised. (:issue:`13940`) | |||
- Bug in ``Series.unique()`` with datetime and timezone returns unique values without timezone, rather than return array of ``Timestamp`` with timezone (:issue:`13565`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this also in the api changes section, as it was not a bug (it was a deliberate decision I think, but we changed our mind)
Looks good to me! |
Moved Just in case, categorical
|
``Index.unique`` consistently returns ``Index`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
``Index.unique()`` now return unique values as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index.unique()
now returns unique values as an Index
of the appropriate dtype
@sinhrks There is a small linting error:
Can be merged apart from that. |
@sinhrks Thanks a lot! |
git diff upstream/master | flake8 --diff