Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indices order of a resulting object returned from value_counts is differ from pandas #1650

Open
YarShev opened this issue Jun 23, 2020 · 2 comments
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@YarShev
Copy link
Collaborator

YarShev commented Jun 23, 2020

Describe the problem

Indices order of a resulting object returned from values_counts and Series.values_counts is differ from pandas implementation. An order of indices is random in pandas, whereas we implemented the logic that sorts both values and indices.

Source code / logs

import modin.pandas as pd
import pandas
import numpy as np
NROWS = 2 ** 8
RAND_LOW = 0
RAND_HIGH = 100
random_state = np.random.RandomState(seed=42)
data = random_state.randint(RAND_LOW, RAND_HIGH, size=(NROWS))
ms = pd.Series(data)
ps = pandas.Series(data)
mr = ms.value_counts()
pr = ps.value_counts()
mr[:10] # sorted (by descending) both values and indices for same values
61    12
1      8
87     6
43     6
14     6
89     5
88     5
59     5
58     5
2      5
ps[:10] # sorted (by descending) values but not sorted indices for same values
61    12
1      8
43     6
14     6
87     6
88     5
2      5
59     5
58     5
89     5
@devin-petersohn devin-petersohn added the bug 🦗 Something isn't working label Jul 24, 2020
@devin-petersohn devin-petersohn added this to the 0.8.2 milestone Jul 24, 2020
@prutskov prutskov assigned prutskov and unassigned prutskov Oct 6, 2020
@devin-petersohn devin-petersohn removed this from the 0.8.2 milestone Oct 19, 2020
@YarShev
Copy link
Collaborator Author

YarShev commented Nov 18, 2020

I think we can improve execution time for Series.value_counts removing the logic related to sorting indices for equal values. That logic can be kept for tests for now in order to Modin and pandas results are equal.

@YarShev
Copy link
Collaborator Author

YarShev commented Nov 18, 2020

The output for the scenario above after removing the sorting logic (#2454).

mr[:10]
61    12
1      8
43     6
14     6
87     6
59     5
2      5
88     5
89     5
58     5

@pyrito pyrito added pandas concordance 🐼 Functionality that does not match pandas P2 Minor bugs or low-priority feature requests labels Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

4 participants