Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroupBy.nth includes group key inconsistently #12839

Closed
sinhrks opened this issue Apr 9, 2016 · 9 comments
Closed

GroupBy.nth includes group key inconsistently #12839

sinhrks opened this issue Apr 9, 2016 · 9 comments
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Apr 9, 2016

Code Sample, a copy-pastable example if possible

nth doesn't inlcude group key as the same as first and last.

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [1, 2, 3, 4, 5],
                   'G': [1, 1, 2, 2, 1]})

g = df.groupby('G')
g.nth(1)
#    A  B
# G      
#1  2  2
#2  4  4

However, calling head makes the behavior change. Looks to be caused by _set_selection_from_grouper caches its selection.

g = df.groupby('G')
g.head()
g.nth(1)
#    A  B  G
# G         
#1  2  2  1
#2  4  4  2

Expected Output

always as below.

g.nth(1)
#    A  B
# G      
#1  2  2
#2  4  4

output of pd.show_versions()

current master.

@sinhrks sinhrks added this to the 0.18.1 milestone Apr 9, 2016
@jreback
Copy link
Contributor

jreback commented Apr 9, 2016

this is fixed in #11039 (but it IS a change)

@jreback
Copy link
Contributor

jreback commented Apr 9, 2016

also some related issues there #7569

@sinhrks
Copy link
Member Author

sinhrks commented Apr 9, 2016

Thx, haven't noticed. This is also related to #7453 (last nth issue).

@sinhrks
Copy link
Member Author

sinhrks commented Apr 9, 2016

We can enable nth tests locates on test_resample after both this and #12840 are fixed.

@sinhrks
Copy link
Member Author

sinhrks commented Apr 27, 2016

This isn't fixed by #11039. Reopen.

@sinhrks sinhrks reopened this Apr 27, 2016
@sinhrks sinhrks modified the milestones: 0.18.2, 0.18.1 Apr 27, 2016
@jreback
Copy link
Contributor

jreback commented Apr 27, 2016

In [1]: df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [1, 2, 3, 4, 5],
   ...:                    'G': [1, 1, 2, 2, 1]})

In [2]: 

In [2]: g = df.groupby('G')

In [3]: g.nth(1)
Out[3]: 
   A  B
G      
1  2  2
2  4  4

In [4]: g.head()
Out[4]: 
   A  B
0  1  1
1  2  2
2  3  3
3  4  4
4  5  5

@sinhrks
Copy link
Member Author

sinhrks commented Apr 27, 2016

Call head before nth.

g = df.groupby('G')
g.head()
g.nth(1)
#    A  B  G
# G         
# 1  2  2  1
# 2  4  4  2

@jreback
Copy link
Contributor

jreback commented Apr 27, 2016

@sinhrks ahh, so something is changing state. ok!

@a-p-man
Copy link
Contributor

a-p-man commented May 27, 2016

On a related note, which version of g.head() below is expected?

In [2]: g = df.groupby('G')

In [3]: g.head()
Out[3]: 
   A  B  G
0  1  1  1
1  2  2  1
2  3  3  2
3  4  4  2
4  5  5  1

In [4]: g = df.groupby('G')

In [5]: g.nth(1)
Out[5]: 
   A  B
G      
1  2  2
2  4  4

In [6]: g.head()
Out[6]: 
   A  B
0  1  1
1  2  2
2  3  3
3  4  4
4  5  5

@jreback jreback closed this as completed in cc0a188 Jul 6, 2016
nateGeorge pushed a commit to nateGeorge/pandas that referenced this issue Aug 15, 2016
closes pandas-dev#12839

Author: adneu <aneumann31@gmail.com>

Closes pandas-dev#13316 from adneu/12839 and squashes the following commits:

16f5cd3 [adneu] Name change
ac1851a [adneu] Added docstrings/comments, and new tests.
4d73cbf [adneu] Updated tests
9b75df4 [adneu] BUG: Groupby.nth includes group key inconsistently pandas-dev#12839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants