-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sort_index behavior differs for the same DataFrame? #9212
Comments
FWIW:
|
under the hood this uses something like this: http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html this is So i'll mark this an enhancement if you'd like to put forth a PR. |
I'll give a crack at it, thanks. |
I do see that I'm not sure
|
This is also odd behavior. Adding a row doesn't simply append it to the bottom, it also switches the labels in the original two rows of the index.
|
This is a bug, and the problem is when the frame is modified,
|
hmm, looks like |
It looks like this has been an issue at least going back to v0.13.1:
In v0.12.0, setting values was not working with
|
The main issue I have with this is that you cannot simply resort the index after an append. It will be completely non-performant. So you can simply try to invalidate |
Agreed. How do you invalidate the sortorder? I don't know what that means. |
sorry...misspoke, you need to invalid the cache on lexsort_depth |
I'm not sure that works.
|
so this does not depend on cache, but also occurs on a freshly generated index:
it is worth inspecting that if anywhere else in the code base breaks if labels are lexically sorted, but not the levels. (i.e. if they make the assumption that labels are always sorted)
if other places in the code also make the assumption that the labels are sorted then this should be fixed in computationally, former path, is not very cheap, so it is worth confirming first that other places in the code depend on levels being sorted and break otherwise. |
@behzadnouri Thanks for your looking more into this. I have a couple of comments. (FWIW, I've been using pandas since 0.9.1 but haven't had a need to dig in until now since it has really just worked. I hope to one day make a contribution myself. My mental model may be out of date with the fast pace of development.)
Because I think of |
@tlmaloney If you'd like to create a separate issue for a distince issue/bug, pls do so, keeping in mind that they should have reproducible examples. can always xref back to here if needed. Generally having 1 issue per 'thing' is a good idea. Using another example with differnt label lenghts to clarify your mental model
So you see that the labels define how long the combinations are, while the length of the labels/levels themselves are the number of levels in the MI. The labels are an indexer INTO the levels array. This is conceptually what a Note that this has nothing to do with |
@jreback That's really interesting, I now understand what's going on a lot better, thanks. There is some cognitive dissonance in me with these two definitions:
Do you also see how it could be a bit confusing? I think there is a naming bug, but since naming is hard and is unrelated to the original issue, if you agree I can create a separate issue and xref this one. |
We did change these for These are came original from R, fyi. |
you can see somewhat of a related discussion in #3268 |
The last comment by @behzadnouri gets at the heart of the problem I think. I'm not sure why the sorting methods look at
|
Similar discussion in #8017. |
The below highlights different behavior between the MultiIndex append method and setting with enlargement on a DataFrame with
|
Quick question: what does "associated factor" mean here? |
What's the status on this one now? Is anyone actively working on a fix of the underlying issue? And, separately, is there any workaround for when "you just need to get the darn dataframe sorted" while we wait for/work on a more permanent fix for sorting? |
@8one6 well its an active issue with no pr, so no-one is working on it. you are welcome to. |
Got it. So in a pinch, I can just |
no just |
duplicate of #13431 |
I feel like I've encountered a bug. In the following scenario, the first
sort_index
call behaves as expected, but the second does not. Does someone know what the difference is here?The text was updated successfully, but these errors were encountered: