-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API/WIP: .sorted #10726
API/WIP: .sorted #10726
Conversation
xref #10721 |
Sort by labels (along either axis), by the values in column(s) or both. | ||
|
||
If both, labels take precedence over columns. If neither is specified, | ||
behavior is object-dependent: Series = on values, Dataframe = on index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe now would be a good time to clean up this API discrepancy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just copied the doc-string from another PR. What do you think it should say/do?
I wonder if we want to take this as a opportunity to clean up this discrepancy for default sort behavior:
It might make more sense for |
How would that work for dataframes? What does 'values' mean in that context? I think it makes more sense to target their commonality: index. |
I think what @shoyer mean is the default could be: This is False in master
proposed:
This is True in master
|
OK, I see. Yes, I had misunderstood @shoyer, if that's what was meant. Thanks |
I was thinking that the default for Alternatively, instead of adding |
yeh I think
Furthermore, the following operations are implemented using
|
I think this trades one discrepancy (values vs. index) for another (default sort on axis=0 vs axis=1). Currently df.sort() is by index, which always made more sense to me. (Disclaimer: I work with timeseries.) What happens if/when |
Actually,
I agree that sorting by the index usually makes sense -- if you have defined a meaningful index. This is often not the case with pandas, and for such users the default behavior of I think there is a lot to be said for requiring and/or encouraging users to be more explicit by having separate The main downside is that it forces users to considering whether something is in the index or the values, which is another confusing distinction we are trying to slowly get away with. So, there could still be a case for having a generic |
@jreback Thanks for working on this! I made an overview of the issue (gathered from the different issues, for who wants to catch up): https://github.com/jorisvandenbossche/pandas/blob/sorting-api/doc/proposals/sorting-API.md I can make a PR from that if that is easier to comment, but copying the discussion points I see here: Discussion points:
|
And my 2 cents at the moment:
1a) I think the new Related to this, I also don't think |
FYI, the ok so here are my thoughts
b) I don't think sorting by all columns is intuitive at all, user should simply have to specify, -1 here. c) sorted is good. SQL-like and matches the name in python, also clear sep from
|
FWIW, I agree with these recent comments. Had to take several steps back and consider how other generalized functions are handled before deciding the index isn't the best target after all. Sorting dataframes by all their columns doesn't feel right, though - the user should have to decide. |
I'm perfectly fine with requiring an explicit |
I agree current proposals, great work:)
|
So It think that I understand now why @wesm has a Imagine
in the new so we would now have to do If we were NOT to allow pass thru columns, then the signature would be almost exactly like currently this will fail (in master) as you cannot sort a non-MultiIndex by level (though that is easily fixed). so, I think to avoid ambiguity then we must either pick:
|
+1 for 5b. |
I don't really see a reason to not integrate So I would go for 5c, although 5b is also OK. |
…andas-dev#8239 DEPR: remove of na_last from Series.order/Series.sort, xref pandas-dev#5231
ok, I reverted on this a bit. Now have full deprecation (e.g. |
also like to get this |
This looks great! +1 to deprecating to df.sort() et al. 🔥
|
ok, I am just going to merge this, and we'll see who whines :) |
nice work! |
Yes, I am really glad we have a nicer interface now! The reason I pressed for DeprecationWarning instead of FutureWarning for now, is that when you upgrade your pandas version and eg using seaborn, you will get a lot of warnings that come from seaborn and you cannot solve. |
@jorisvandenbossche ok, let's leave that as an item for the rc then. maybe will make that change if too much complaining. |
fix some stacklevels on warnings
Some failing tests in the previous commits because older ``pandas`` versions don't have ``Series.sort_values``. That method was only added in pandas 0.17, in pandas-dev/pandas#10726
Some failing tests in the previous commits because older ``pandas`` versions don't have ``Series.sort_values``. That method was only added in pandas 0.17, in pandas-dev/pandas#10726
Clarifies the meaning of 'sort' in the context of `Categorical` to mean 'organization' rather than 'order', as it is possible to call this method (as well as `sort_values`) when the `Categorical` is unordered. Also patches a bug in `Categorical.sort_values` in which `na_position` was not being respected when `ascending` was set to `True`. This commit aligns the behaviour with that of `Series`. Finally, deprecates `sort` in favor of `sort_values`, which is in alignment with what was done with `Series` back in #10726. Closes #12785 Author: gfyoung <gfyoung17@gmail.com> Closes #12882 from gfyoung/categorical-sort-doc and squashes the following commits: f324a9c [gfyoung] BUG, DOC, DEP: Patch and Align Categorical's Sorting API
Affect classes: 1) Index 2) Series 2) DataFrame xref pandas-devgh-10726
Affect classes: 1) Index 2) Series 2) DataFrame xref pandas-devgh-10726
Affect classes: 1) Index 2) Series 2) DataFrame xref pandas-devgh-10726
Affect classes: 1) Index 2) Series 2) DataFrame xref gh-10726
Affect classes: 1) Index 2) Series 2) DataFrame xref pandas-devgh-10726
Affect classes: 1) Index 2) Series 2) DataFrame xref pandas-devgh-10726
closes #9816
closes #8239