-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG/API: allow TimeGrouper with other columns in a groupby (GH3794) #6516
Conversation
@cpcloud @hayd @jorisvandenbossche lmk what you think about the way to specify what the TimeGrouper applies (as maybe we should use this elsewhere), eg #5677 |
maybe what would resolve both of these is to allow something like a where this example could be written as: so if you want to specify something:
you would groupby: a level named so
|
This looks very useful. Just to make sure, you've only got the grouping by index level and column name working with the special case of a time grouper right? I made some progress on #5677, but it's really messy. IIRC, the code for determining the grouping is really tightly tied to building the groups. Maybe I'll look at that again today and push something up. Then we can think about out to handle this also. |
tom that's right a construct like G solves the problem and/or can try to figure it out but for TimeGrouper it's necessary (you can figure it out too but it's more ambiguous and should be explicitly specified) |
@TomAugspurger @hayd anybody like/dislike the |
I really like the syntax. Only issue I see is if G is too short/not descriptive... |
|
annoyingly both these already exist in groupby |
I know, but they are internal, so could rename how about just |
I was thinking |
The functionality here would really be a big improvement!
I was wondering, are there other places users would at the moment use |
This is going to become the base class of TimeGrouper; its not really any different, but would allow one to disambiguate things.
|
updated.... @TomAugspurger @jorisvandenbossche @hayd @cpcloud take a look when you have a chance (top section is updated) This is actually not a very big change in implementation (e.g. determining how to group is still very complicate). But a bit cleaner I think. Could be cleaned up a bit more, e.g. a Categorical could be integrated into the Grouper hierarchy now (i'll open an issue for that), |
Looks good. When I groupby by just a
I think we want to support this case, right? |
that's a bug |
I really like this change. |
rename internally Grouper to BaseGrouper to avoid conflict TimeGrouper to now inherit from Grouper
well....turns out had to refactor a bit to get a nice inteface, whoosh. Upside is I actually understand grouping now. I put in lots of comments to explain, so should be easy to build on this (hopefully). @TomAugspurger I think #5677 should be very straightforward on top of this and/or may not be necessary (as you can do it now, maybe just expand the docs section a tiny bit). |
in a more elegant / cleaner way by keeping internal groupby state inside the Grouper rather than passing around lots of results DOC: minor doc edits for groupby.rst / v0.14.0 PEP8: minor pep changes
BUG/API: allow TimeGrouper with other columns in a groupby (GH3794)
if obj.index.name != level: | ||
raise ValueError('level name %s is not the name of the ' | ||
'index' % level) | ||
elif level > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback is this elif
supposed to be shifted left to be under the if
at line 226?
@TomAugspurger if prob could but just catching an invalid level that is passed str a multi-index prob could be combined |
@@ -125,6 +125,8 @@ API Changes | |||
``DataFrame.stack`` operations where the name of the column index is used as | |||
the name of the inserted column containing the pivoted data. | |||
|
|||
- Allow specification of a more complex groupby, via ``pd.Groupby`` (:issue:`3794`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pdf.Groupby -> pd.Grouper?
Some after merge comments:
|
i'll fix the docs....regarding your 2nd point. none of the methods are meant to be public, but this is a 'private' class except for the actual constructor, so not sure what to do about that. any suggestions? |
just make all methods |
|
Can you try to keep to the numpydoc format?
The examples will also be rendered nicely (on the html pages) if you do eg:
|
don't want to make any methods internal. They are all 'internal' for the most part. I guess this could be changed but its a bit of a project in itself. |
|
What do you mean with 'make' them internal? What I meant was just simply renaming them eg |
Another formatting comment: the explanation in
does not have to be aligned with Something else, in |
addressed in #6655 |
@jreback Something else, |
no Grouper will create a TimeGrouper if passed a freq and TimeGrouper is a subclass of Grouper so any kw args are passed thru their are some more args that are mainly used by resample but I am not sure that you care about for grouping eg base,loffset don't matter when grouping closed might though - if so I will add to Grouping docs |
OK, Thanks! |
closes #3794