Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement+test mean for datetimelike EA/Index/Series #24757

Merged
merged 44 commits into from
Jun 10, 2019

Conversation

jbrockmendel
Copy link
Member

This takes over from #23890, implementing+testing reductions one at a time.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@gfyoung gfyoung added Testing pandas testing functions or related to the test suite Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 13, 2019
@gfyoung gfyoung added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff ExtensionArray Extending pandas with custom dtypes or arrays. and removed ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 13, 2019
@jreback
Copy link
Contributor

jreback commented Jan 13, 2019

we are still treating datetimes as non-numeric yes? meaning that by default they will be exlcuded from say .mean() and you have to pass numeric_only=False?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you see if we have coverage of this on mixed dtype frames as well? (see my comment about numeric_only). Also need to assert that other ops automatically exclude (e.g. std).

@codecov
Copy link

codecov bot commented Jan 14, 2019

Codecov Report

Merging #24757 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24757      +/-   ##
==========================================
+ Coverage   92.38%   92.38%   +<.01%     
==========================================
  Files         166      166              
  Lines       52363    52378      +15     
==========================================
+ Hits        48376    48391      +15     
  Misses       3987     3987
Flag Coverage Δ
#multiple 90.81% <100%> (ø) ⬆️
#single 42.91% <20%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/datetimelike.py 97.72% <100%> (+0.04%) ⬆️
pandas/core/indexes/datetimelike.py 98.52% <100%> (ø) ⬆️
pandas/core/series.py 93.69% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 453fa85...30eeb64. Read the comment docs.

@codecov
Copy link

codecov bot commented Jan 14, 2019

Codecov Report

Merging #24757 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24757      +/-   ##
==========================================
- Coverage   91.88%   91.87%   -0.01%     
==========================================
  Files         174      174              
  Lines       50692    50708      +16     
==========================================
+ Hits        46576    46588      +12     
- Misses       4116     4120       +4
Flag Coverage Δ
#multiple 90.41% <100%> (ø) ⬆️
#single 41.79% <17.64%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/datetimes.py 96.36% <ø> (ø) ⬆️
pandas/core/indexes/timedeltas.py 90.96% <ø> (ø) ⬆️
pandas/core/arrays/datetimelike.py 97.93% <100%> (+0.04%) ⬆️
pandas/core/indexes/datetimelike.py 98.14% <100%> (ø) ⬆️
pandas/core/series.py 93.62% <100%> (+0.01%) ⬆️
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d124ea...111c345. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need the tests from the OP (e.g. a mixed DataFrame)

pandas/core/series.py Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

we are still treating datetimes as non-numeric yes? meaning that by default they will be exlcuded from say .mean() and you have to pass numeric_only=False?

I think that makes the most sense.

@TomAugspurger
Copy link
Contributor

Does this close #24752?

@jbrockmendel
Copy link
Member Author

Does this close #24752?

I don't think so, will need to double-check.

@jbrockmendel
Copy link
Member Author

No, no it does not.

pandas/core/series.py Show resolved Hide resolved
@@ -47,6 +47,27 @@ def test_ops_properties_basic(self):
assert s.day == 10
pytest.raises(AttributeError, lambda: s.weekday)

# TODO: figure out where in tests.reductions this belongs
def test_mean(self, tz_naive_fixture):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems very duplicative of the test below, is there a reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bc at the time when this was written Index reduction tests had not been collected in tests.reductions. Will move/de-duplicate.

@jbrockmendel
Copy link
Member Author

I guess these have to be uniform periods so mean is just another period, though I think this could easily be a non-sensical return values, e.g. what is the mean of pd.date_range('2012', freq='M', periods=4)

Eyeballing it at about 2012-03-14. Why would that be nonsensical?

@jreback
Copy link
Contributor

jreback commented Jan 16, 2019

it’s no longer a M freq Period. my point is the mean if a span is an odd thing

@jorisvandenbossche
Copy link
Member

@jbrockmendel instead of keeping rebasing this, can you actually answer to my request for changes? (unless it is just lack of time that you didn't make any changes yet, that's no problem)

@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

can you merge master and fix the conflicts.

@jorisvandenbossche this has been outstanding a while, any remaining issues?

@jreback
Copy link
Contributor

jreback commented Apr 20, 2019

@jbrockmendel also pls respond to open questions from @jorisvandenbossche

@@ -1435,6 +1435,50 @@ def max(self, axis=None, skipna=True, *args, **kwargs):
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)

def mean(self, axis=None, skipna=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don'e we also need out=? meaning we should just accept kwargs like we do for Series.mean

Parameters
----------
axis : None
Dummy parameter to match NumPy signature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would just add kwargs for this purpose; we don't add them explicity anywhere else

@jreback
Copy link
Contributor

jreback commented May 12, 2019

can you merge master; this looks fine otherwise, though I think @jorisvandenbossche had a question.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good for now. There's still an outstanding discussion about accepting additional arguments so that methods like np.mean work correctly. These are

  • axis
  • dtype
  • out
  • keepdims (optional)

but I'm fine with leaving that as a followup when someone wants it.

@jorisvandenbossche
Copy link
Member

Given the lack of response of @jbrockmendel, I added a commit that removes the axis keyword (single commit, so can be easily removed again if that is a problem).
Rationale: since we don't do any compat-handling of numpy keyword arguments, let's also not do it for axis. We can always add that later.

Fine with merging it like this.

@jreback
Copy link
Contributor

jreback commented May 14, 2019

ok, pls merge on green. @jorisvandenbossche I guess worthwile to create a followup issue.

@jbrockmendel
Copy link
Member Author

Given the lack of response of @jbrockmendel,

My bad. Starting next month I'm working FT on pandas, so until then I'm focusing on statsmodels and spending time outdoors. Will rebase momentarily.

@jreback
Copy link
Contributor

jreback commented May 16, 2019

I think this will pass if you merge master?

@jorisvandenbossche
Copy link
Member

until then I'm focusing on statsmodels and spending time outdoors.

Enjoy your time off!


Failures were actually real ones: pushed fixes for a problem with the axis (that I caused) and a docstring lint error

@jreback
Copy link
Contributor

jreback commented Jun 3, 2019

@jbrockmendel one more merge master pls.

@jbrockmendel
Copy link
Member Author

gentle ping

@jreback jreback merged commit efc7f2f into pandas-dev:master Jun 10, 2019
@jreback
Copy link
Contributor

jreback commented Jun 10, 2019

thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Datetime Datetime data dtype ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants