Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: No numeric_only argument for pandas.core.groupby.GroupBy.rank() #44438

Open
notBillJames opened this issue Nov 13, 2021 · 4 comments
Open
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Groupby

Comments

@notBillJames
Copy link

pandas.core.groupby.GroupBy.rank does not have a numeric_only argument like DataFrame.rank()

I have a DataFrame with several statistics from baseball teams across different years. I want to rank each team in each statistic, grouped by season. Every column is numeric, except for the teamID column which is an object type containing the names of each team as a string. My code looks something like this

stats = pd.read_csv('stats.csv')
ranked = stats.groupby('yearID').rank(method='average', ascending=False)
ranked['teamID'] = stats['teamID']

Since it is GroupBy.rank() I can't pass the numeric_only argument and that means I have to reassign ranked['teamID'] to the original column. I also cannot do

ranked = stats.groupby(['yearID', 'teamID']).rank(...)

because that would give everybody a rank of 1.

Is there a reason that numeric_only is included in DataFrame.rank() but not GroupBy.rank(). Could it be added to GroupBy.rank()?

Then I could code it like this, which would be easier.

```python
stats = pd.read_csv('stats.csv')
ranked = stats.groupby('yearID').rank(method='average', ascending=False, numeric_only=False)

I am just a hobbyist and so I don't know much about the implementation of these methods which means there may be something I am completely ignoring, or another more efficient way to do it. If so I would appreciate some enlightenment about what I am missing. Thanks!

@notBillJames notBillJames added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 13, 2021
@Mislav-Ro
Copy link

Mislav-Ro commented Nov 13, 2021

I think I understood what you want but to be sure could you provide sample data set and what you are trying to accomplish? Perhaps provide 2 tables (in and out)

You are missing 1 step. You should specify what to do with the grouped values, should it take max value out of each groupby or min value etc.

pd.DataFrame.groupby().max().rank() --- I think this is what you are looking for

code below for your case:

ranked = stats.groupby('yearID').max().rank(method='average', ascending=False)

Hope this clarifies things a bit

P.S. you can also specify
pd.DataFrame.groupby().max().rank(numeric_only=True)

@notBillJames
Copy link
Author

notBillJames commented Nov 14, 2021

Basically I want to know how a team ranked for each statistic in a given season.
Going in the table would look like this:

yearID teamID W IP K% K/BB+ K%+ WAR
0 1947 Athletics 15.5 251.65 9.45 91.5 97 2.9
1 1947 Braves 21 277.6 11.1 155.5 116.5 5.5
2 1947 Browns 8 185.1 11.3 143 116 3.2
3 1947 Cardinals 15 196.05 12.5 168.5 131.5 4.45
4 1947 Cubs 10.5 195 10.25 119 107.5 2.55

Then going out I would like the table to look like this:

yearID teamID W IP K% K/BB+ K%+ WAR
0 1947 Athletics 2 2 5 5 5 4
1 1947 Braves 1 1 3 2 2 1
2 1947 Browns 5 5 2 3 3 3
3 1947 Cardinals 3 3 1 1 1 2
4 1947 Cubs 4 4 4 4 4 5

So the 1947 Braves finished first in W, IP, and WAR. My plan is to make a new column which is the average of all the other rankings and then sort by the average to see who was the most "dominant" in those areas. Would doing
ranked = stats.groupby('yearID').max().rank(method='average', ascending=False) accomplish that?

I did find a workaround by doing

ranked = stats.set_index(['yearID', 'teamID'])
ranked = ranked.groupby('yearID').rank(ascending=False).reset_index()

and that yields the same as above.

@Mislav-Ro
Copy link

I see.
Indexes do get erased when calling groupby().rank() , also numeric_only cannot be specified in that manner. We could try to implement this.

Your workaround looks fine.

Could I get assigned for this issue after someone else also confirms it?

@notBillJames
Copy link
Author

Okay, thank you for the help!

@notBillJames notBillJames reopened this Nov 15, 2021
@rhshadrach rhshadrach added API - Consistency Internal Consistency of API/Behavior Groupby labels Dec 5, 2021
@mroeschke mroeschke removed the Needs Triage Issue that has not been reviewed by a pandas team member label Dec 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Enhancement Groupby
Projects
None yet
Development

No branches or pull requests

4 participants