Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupby.nth with dropna is buggy #11038

Closed
behzadnouri opened this issue Sep 9, 2015 · 4 comments · Fixed by #53519
Closed

groupby.nth with dropna is buggy #11038

behzadnouri opened this issue Sep 9, 2015 · 4 comments · Fixed by #53519
Labels
Bug Filters e.g. head, tail, nth Groupby Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@behzadnouri
Copy link
Contributor

>>> df
   1st  2nd
0    1  NaN
1    2    3
2    1  NaN
3    2    4
>>> df.groupby('1st')['2nd'].nth(0, dropna=True)
1st
1    3
2    4
Name: 2nd, dtype: float64
@tnir
Copy link
Contributor

tnir commented Oct 12, 2015

@behzadnouri You should pass either of None, 'any' or 'all' as dropna, instead of True. We should deprecate passing True to nth.

When you however pass either 'any' or 'all', there is still a bug.

>>> df = pd.DataFrame([[1],[2,3],[1],[2,4]], columns=['1st', '2nd'])
>>> df
   1st  2nd
0    1  NaN
1    2    3
2    1  NaN
3    2    4
>>> df.groupby('1st')['2nd'].nth(0, dropna='any')
1st
1    3
2    4
Name: 2nd, dtype: float64

>>> df.groupby('1st')['2nd'].nth(0, dropna='all')
1st
1    3
2    4
Name: 2nd, dtype: float64

Exchanging the order of selecting a column and nth(), we expect the correct results as follows:

>>> df.groupby('1st').nth(0, dropna='any')['2nd']
1st
1   NaN
2     3
Name: 2nd, dtype: float64

>>> df.groupby('1st').nth(0, dropna='all')['2nd']
1st
1   NaN
2     3
Name: 2nd, dtype: float64

@jreback
Copy link
Contributor

jreback commented Oct 14, 2015

@tnir yep, the argument is a bit too free and should be checked at the top of nth, rather than in the conditional (which misses some cases).

As always pull-requests to fix are welcome.

@tnir
Copy link
Contributor

tnir commented Sep 10, 2017

This is still reproducible with pandas-0.20.3 on Python 3.6:

$ python
Python 3.6.0 (default, Dec 26 2016, 15:43:50)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
>>> import pandas as pd
>>> df = pd.DataFrame([[1],[2,3],[1],[2,4]], columns=['1st', '2nd'])
>>> df
   1st  2nd
0    1  NaN
1    2  3.0
2    1  NaN
3    2  4.0
>>> df.groupby('1st')['2nd'].nth(0, dropna='any')
1st
1    3.0
2    4.0
Name: 2nd, dtype: float64
>>> df.groupby('1st')['2nd'].nth(0, dropna='all')
1st
1    3.0
2    4.0
Name: 2nd, dtype: float64
>>> df.groupby('1st').nth(0, dropna='any')['2nd']
1st
1    NaN
2    3.0
Name: 2nd, dtype: float64
>>> df.groupby('1st').nth(0, dropna='all')['2nd']
1st
1    NaN
2    3.0
Name: 2nd, dtype: float64

tnir added a commit to tnir/pandas that referenced this issue Sep 10, 2017
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach
Copy link
Member

This now raises ValueError: For a DataFrame or Series groupby.nth, dropna must be either None, 'any' or 'all', (was passed True)..

The code from #11038 (comment) now results in

1    3.0
Name: 2nd, dtype: float64

1    3.0
Name: 2nd, dtype: float64

This deviates from the expected behavior mentioned there, but looks right to me. In particular, by performing selection dropna is now dropping all of group 1. Also note that the 1 in the result is not an indication of the group, but rather the index value in the row 1 2 3.0 of the original DataFrame for group 2.

@rhshadrach rhshadrach added Needs Tests Unit test(s) needed to prevent regressions Filters e.g. head, tail, nth labels Jun 4, 2023
@rhshadrach rhshadrach added this to the 2.1 milestone Jun 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Filters e.g. head, tail, nth Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants