Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH/BUG groupby nth now filters, works with DataFrames #6569

Merged
merged 2 commits into from
Mar 7, 2014

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Mar 7, 2014

fixes #5552

partial for #5264

In [101]: df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B'])

In [102]: g = df.groupby('A')

In [103]: g.nth(0)
Out[103]:
   A   B
0  1 NaN
2  5   6

In [104]: g.nth(1)
Out[104]:
   A  B
1  1  4

In [105]: g.nth(-1)
Out[105]:
   A  B
1  1  4
2  5  6

In [106]: g.nth(0, dropna='any')  # old behaviour-like
Out[106]:
   B
A
1  4
5  6

In [107]: g.nth(1, dropna='any')  # old behaviour-like
Out[107]:
    B
A
1 NaN
5 NaN

@jreback jreback added this to the 0.14.0 milestone Mar 7, 2014
@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

Also note old behaviour was not stable/correct for negative (now fixed with PR&dropna):

In [9]: g.nth(-3,)
Out[9]:
               B
A
1  2.144760e-314
5  2.124748e-314

In [10]: g.B.nth(-3,)
Out[10]:
A
1    2.144760e-314
5    2.144337e-314
Name: B, dtype: float64

@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

If you get around to it; I suspect the new method is MUCH faster than the old, so maybe add a vbench

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

will append a vbench. Is much faster except when applying to dataframe with dropna (old-style) it's a little slower, but that was previously borked.

Um, obviously there is overlap with first and last methods, they be got with nth(0) and nth(-1) but not tested the differences yet... you reckon these should change too?

@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

yes I think you should blow away first/last code and just alias them to nth(0) and nth(-1).

reminds me that pls put some tests that deal with different types (because first/last have this convert arg..though not sure why)

@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

though maybe nth(0) (first) and (iloc[0]) deserver a fast-path as it doesn't need the machinery of cumcount

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

there was/is a weird test for types of first/last/nth, I tweaked it a little but is still there...

I can iterate tests over a few of the same df (but with different column types), is that what you mean?

Yea, re fast path (will see how they compare the the cumcount for now)...

@@ -165,10 +164,10 @@ def test_first_last_nth(self):
grouped['B'].last()
grouped['B'].nth(0)

self.df['B'][self.df['A'] == 'foo'] = np.nan
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.....this should have actually raised a SettingWithCopy (as the test suite sets it to raise)...wierd

@hayd
Copy link
Contributor Author

hayd commented Mar 7, 2014

Added vbench, is about 40 times faster with not-including the setup of the groupby (which is included in the bench)

@jreback
Copy link
Contributor

jreback commented Mar 7, 2014

awesome!

hayd added a commit that referenced this pull request Mar 7, 2014
ENH/BUG groupby nth now filters, works with DataFrames
@hayd hayd merged commit 6e758b7 into pandas-dev:master Mar 7, 2014
@hayd hayd deleted the groupby_nth branch March 7, 2014 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

nth groupby method on DataFrame
2 participants