Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: pd.TimeGrouper #26477

Merged
merged 3 commits into from
May 24, 2019
Merged

CLN: pd.TimeGrouper #26477

merged 3 commits into from
May 24, 2019

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke added Clean Datetime Datetime data dtype labels May 21, 2019
@mroeschke mroeschke added this to the 0.25.0 milestone May 21, 2019
@codecov
Copy link

codecov bot commented May 21, 2019

Codecov Report

Merging #26477 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26477      +/-   ##
==========================================
- Coverage   91.75%   91.74%   -0.01%     
==========================================
  Files         174      174              
  Lines       50765    50759       -6     
==========================================
- Hits        46578    46568      -10     
- Misses       4187     4191       +4
Flag Coverage Δ
#multiple 90.25% <ø> (-0.01%) ⬇️
#single 41.72% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/core/api.py 100% <ø> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97.02% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5cc078...caedf0d. Read the comment docs.

@codecov
Copy link

codecov bot commented May 21, 2019

Codecov Report

Merging #26477 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26477      +/-   ##
==========================================
+ Coverage   91.75%   91.75%   +<.01%     
==========================================
  Files         174      174              
  Lines       50765    50673      -92     
==========================================
- Hits        46578    46494      -84     
+ Misses       4187     4179       -8
Flag Coverage Δ
#multiple 90.26% <ø> (ø) ⬆️
#single 41.7% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/core/api.py 100% <ø> (ø) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/dtypes/common.py 95.85% <0%> (-1.6%) ⬇️
pandas/core/indexes/timedeltas.py 90.96% <0%> (-0.63%) ⬇️
pandas/core/indexes/datetimes.py 96.36% <0%> (-0.49%) ⬇️
pandas/core/indexes/frozen.py 91.78% <0%> (-0.33%) ⬇️
pandas/core/frame.py 97% <0%> (-0.14%) ⬇️
pandas/core/sparse/series.py 93.18% <0%> (-0.13%) ⬇️
pandas/plotting/_core.py 83.77% <0%> (-0.13%) ⬇️
pandas/core/internals/managers.py 93.87% <0%> (-0.06%) ⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5cc078...d62a80f. Read the comment docs.

@@ -11,6 +11,7 @@
import pandas as pd
from pandas import DataFrame, Index, MultiIndex, Series, Timestamp, date_range
from pandas.core.groupby.ops import BinGrouper
from pandas.core.resample import TimeGrouper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we remove this entirely from resample as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally, TimeGrouper still holds a lot of the core metadata for resampling. The reason why the TimeGrouper isn't needed at the toplevel is because of this shortcut in Grouper:

    def __new__(cls, *args, **kwargs):
        if kwargs.get('freq') is not None:
            from pandas.core.resample import TimeGrouper
            cls = TimeGrouper
        return super().__new__(cls)

So AFAICT, the internal TimeGrouper is still needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments below, we don't want to use this in the user facing tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any objection to replacing TimeGrouper internally as well? Always looking to reduce GroupBy complexity so getting rid of an entire class would be helpful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this would be easy

TimeGrouper does a lot of stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured as such. Would still be nice if not for a full class removal to even remove any now internally unused methods and keep paring down the groupby code. If I see any opportunities I'll push a PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely +1 for having Grouper adopt TimeGrouper code. Should be possible.

@@ -365,10 +366,8 @@ def sumfunc_value(x):
return x.value.sum()

expected = df.groupby(pd.Grouper(key='date')).apply(sumfunc_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change these to use Grouper (here and below)

@@ -11,6 +11,7 @@
import pandas as pd
from pandas import DataFrame, Index, MultiIndex, Series, Timestamp, date_range
from pandas.core.groupby.ops import BinGrouper
from pandas.core.resample import TimeGrouper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comments below, we don't want to use this in the user facing tests

@jreback jreback merged commit 374478c into pandas-dev:master May 24, 2019
@jreback
Copy link
Contributor

jreback commented May 24, 2019

thanks @mroeschke

@mroeschke mroeschke deleted the remove_timegrouper branch May 25, 2019 16:05
another-green pushed a commit to another-green/pandas that referenced this pull request May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants