-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nth groupby method on DataFrame #5552
Comments
Maybe I just don't understand what nth does, not always empty. |
It's also really slow: http://stackoverflow.com/a/20087789/1240268 I'm of the opinion that nth should take the nth, regardless of NaN, or take kwarg for NaN-ness. Need to think through the logic of what it is doing atm (it's only small!) |
For convenience the current impl is:
With a frame the I like the idea of a kwarg to describe this, grr to current behaviour: Series/DataFrame being inconsistent. Not sure how we should play this. ...in any case we should use cumcount, much faster. |
Other possibility is plonk/override this into Series Groupby "as is*", but have it in Groupby the just take the nth row regardless of NaN, at least for now. Also atm this is not that great, as it NaNs for groups smaller than n. * but using cumcount. |
Any thoughts on this @jreback @TomAugspurger? Should we somehow depreciate old Series behaviour (of getting nth non-null result, I think this may be something that R does which is why we do it...)? Not sure on way forward. I still don't understand how this even works on a DataFrame... cumcount impl much easier (confusing to be different to Series though it already is). |
I think u should break the API and have a na='drop' (but default to na=None), meaning don't remove Nan's |
Should we have something similar for DataFrames? Could be drop_na (think this is inline with other args), You think this arg should only be for Series or hack something which will work for Dataframes too? |
I think for frames u could do any/all |
sure, I think that's not so bad. I will remember to use |
I'm not sure on the API for this to the old behaviour (i.e. not just filtering), for one thing there are two NaNs:
For another these are two very different results, you only get the groupby's by info if you use the old method. I still think should change... just not sure on API for it.
|
travis is now happy...so go ahead at your leisure |
@jreback still unsure about the API, any thoughts. I guess the old behaviour is an R thing? I guess it's annoying that these are two very different things: a. filter frame and series to take just the nth rows. result is subframe/series. * amusingly I think the iget call I was confused about raises but is caught so get NaN |
I think you need to support both cases, but make default a)
so |
The nth groupby method on a Series takes the nth non-NaN value in the Series.
This means on a DataFrame it's not going to be well defined...
Should we make this a Series only method?
The text was updated successfully, but these errors were encountered: