-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby enumerate method #4646
Comments
counter() ? |
I kind of favour I was wrong: count isn't taken for groupby, it's size, but I'm -1 on same/similarly named methods with different meaning across groupby/DataFrame. |
plus with enumerate maybe you could provide an iterator or column to cycle through.. |
@cpcloud Interesting idea, not sure how it'd work (if you pass in an iterator it'd be eaten by the first thing/start part way through?). But perhaps mapping after the enumeration would be the way to go (mod and apply a getitem)?
To implement is there a more efficient way? Or should it just be a shortcut:
|
I wonder if enumerate could also mean enumerate by the group it's in (or does that not make sense?) maybe that could be a possible kwarg |
enumerate is something specific in Python (iterator with index), might want As I read through I had this question: how is this different from |
it's enumerating the integer index of the elements in a group |
sort of like a |
(which is a horrible name for this) |
I think enumerate is the right name for this because it's something specific to python... I really like this name. Was wondering if another potentially useful/expected thing for enumerate to do is enumerate the groups (I'm not sure), a little bit like this:
|
Although potentially the groups are not well ordered anyway.... |
(I think they are ordered, so this should be ok, I wonder if these two alternatives could be distinguished with a well chosen kwarg...) |
My point of view, a name with the verb 'to count' is the most intuitive enumerate in python is not counting |
I take the point that it is somewhat confusing, but thing that any other option will be significantly more so. This is enumerating each of the "apparitions". count is the wrong word since it usually has the meaning (elsewhere in pandas, e.g. DataFrame .count) of counting the total occurrences (i.e. groupby .size). We are not counting, we are enumerating.
If you replace index with occurrence, this is exactly what we are doing. |
how about |
I like |
other names: That's all i can think now of |
I checked English dictionary sorry, English is not my first language |
maybe "appearances" is close to what you were thinking? |
-1 on count or counter (as mentioned above). Also -1 on rolling_count since those function have windows rather than cumulative, and actually it does make sense to apply these. I wonder if cumcount makes sense, inline with already existing cumsum etc. :s (though I dislike it mathematically) I'm unsure about tally, it could be ok but I'm unsure if it's a bit of colloquialism (I think it's less clear than enumerate)... |
I like EDIT: Actually, after thinking a bit more about |
@ifmihai I like apparitions 👻 😄 |
@TomAugspurger It's always increasing within each group, just like cumcount/enumerate. cumsum is already available:
(and is not increasing) |
Actually I think tally suffers from one of issues that count etc. does, it usually means total (and not cumulative total). If we were to go with cumcount etc we should make cumsum etc. a bit more visible... |
Yep, I was wrong. Not sure what I was thinking. Should we split the difference and go with |
@jtratner YES! appearances was the word I was searching for, From my perspective (of a foreigner) tally doesn't mean anything, especially in programming @cpcloud ha ha! anyway, apparitions are interesting :) some other words that can be used (maybe): personally I prefer appearances or track_appearances I guess it will not be so much a used function, so I guess the name can be longer if needed, right? |
-1 for If I would describe the action, I would say I number the items within the group (give them a number), so maybe |
Let's go with cumcount then.
|
+1 for cumcount |
+1 here too |
cumcount it is, now in master/0.13. |
One annoying thing I've realised is as_index, if you pass a groupby which is as_index should it include that in the results index. Note filter doesn't (should it??), most other things do... or try to at least: Observe:
thoughts? |
So what's the question? :) ps. |
@ifmihai I think I'll put out this question to a more general one about as_index consistency. It's a little strange as nth does a different thing too. Difference is when you look at the index of the above results, the head has A prepended to the index... cumcount is not a Series method, what were you thinking it would do? sugar for s.groupby(s).cumcount() ? |
@hayd when you have a chance, can you add a |
I was wondering if index should be:
This came up as I was trying to tweak nth, but got into a muddle with what that does to get its index. |
@hayd I use a separate function now, like cumcount(), to count a Series, or a column in a df. I wasn't even thinking about df.groupby() up to this thread. Right now I don't see the use too much through groupby(), as I don't have user cases in mind. Now back to the original question, with as_index, ps. I cannot work with 0.13 right now (I don't know how to play with separate environments) |
I'm not sure what a good word for this is (count is taken, order means sort)!
But it's quite an often used thing to create a column which enumerates the items in each group / counts their occurrences.
You can hack it:
I've seen this in a few SO questions, here's just one.
cc @cpcloud (and I've seen @jreback answer a question with this)
The text was updated successfully, but these errors were encountered: