-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: guarantee pandas.Series.value_counts "sort=False" to be original ordering #12679
Comments
|
@jreback |
no this is an arbitrary order, subject to how the hashtable works. Its not a guarantee or ANY kind of ordering. Typically this routine sorts by biggest values, and that is almost always what you want. I think this could do what you want (e.g. original ordering)
yeah, I would say we could change this to have |
+1 Have the order being dependent on the internals of the hashtable (and so also dependent of the dtype of your data) seems not very useful as an actual return value for |
For anyone looking to tackle this, IIRC there isn't a lot of code sharing between |
The order is changed from
and map the result to new lists from the order of the index list:
I am not sure whether it's worth to change or whether there's better way to solve the problem. |
+1 for maintaining the original order |
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
Ensure that value_counts returns the same ordering of the indices than the input object when sorting the values no matter if it is ascending or descending. This fixes pandas-dev#12679.
is this closed by #32449 ? |
pretty sure this is ok now that we have consistent hashing cc @realead if you want to have a look |
IIUC, in order to preserve the original order, hashmap needs to be insertion-ordered (like CPython's dicts for Py3.6+), but khash-maps aren't. Changes to hash-functions will not fix it. It looks like there are at least two approaches at hand:
Second option seems to be a "more fundamental" fix and probably faster, if uniques aren't precalculated. However it still will lead to a performance decrease, which might be an issue. A perfect solution would have following options for order of the output:
I'm not sure how much the overhead for "insertion ordered" could be. Assuming that cache misses are the bottle-neck it could be up to 50%-100% slower. |
thanks for the analysis @realead . i don't think performance is a big deal here. I would opt for insertion order (via option 2) if its not too complicated (it sounds like we already have some of this so maybe would be fine). if that is not feasible then a fix, ala |
Hello,
I'm trying to make a new DataFrame that contains the value counts of a column of an existing DataFrame (spreadsheet.xlsx), but I want the rows in the new DataFrame to be in the same order as the old one.
When I do:
I get the DataFrame:
When what I really want is:
...because that's the order in which the values occur in the original DataFrame.
I can't tell how it's sorting them, but it is somehow. Is this the expected behavior?
Thanks
The text was updated successfully, but these errors were encountered: