Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement+test mean for datetimelike EA/Index/Series #24757

Merged
merged 44 commits into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
434e2cd
implement+test mean for datetimelike EA/Index/Series
jbrockmendel Jan 13, 2019
c8abf33
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 14, 2019
30eeb64
update imports
jbrockmendel Jan 14, 2019
d48e2ef
isort fixup
jbrockmendel Jan 14, 2019
0e32be2
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 14, 2019
231458d
params for docstring
jbrockmendel Jan 14, 2019
1129e8c
test for numeric_only
jbrockmendel Jan 14, 2019
38f829a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 16, 2019
aba90ec
copy/paste fixup
jbrockmendel Jan 16, 2019
176b355
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 17, 2019
ccb790b
Disable for PeriodArray
jbrockmendel Jan 17, 2019
4f4cb6d
Delete assertions missed in previous commit
jbrockmendel Jan 17, 2019
ec83db1
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 17, 2019
5fb1db9
xfail numeric_only=False case
jbrockmendel Jan 17, 2019
028c789
Merge branch 'means' of https://github.com/jbrockmendel/pandas into m…
jbrockmendel Jan 17, 2019
50e714e
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 29, 2019
1b24e7d
add todo comment
jbrockmendel Jan 29, 2019
4d40906
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jan 29, 2019
a49da37
dont expect datetime in dataframe.mean
jbrockmendel Jan 29, 2019
94d3466
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 1, 2019
e4e6a03
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 7, 2019
7953a7b
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 11, 2019
15307da
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 18, 2019
da719e1
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 22, 2019
637b415
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Feb 28, 2019
58bca36
whatsnew
jbrockmendel Feb 28, 2019
abcb87a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 3, 2019
4df0b1c
add versionadded
jbrockmendel Mar 3, 2019
c9736d7
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 9, 2019
d2f5e6f
change NotImplementedError to TypeError
jbrockmendel Mar 9, 2019
de9025c
add mean to class docstrings and docs/source/reference/indexing.rst
jbrockmendel Mar 9, 2019
4c553a9
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 14, 2019
330fc41
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Mar 20, 2019
7c6201b
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 5, 2019
642d4e2
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 12, 2019
7b9fd42
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Apr 20, 2019
4be012f
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 12, 2019
5682b65
remove axis keyword for now
jorisvandenbossche May 14, 2019
581ff1a
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 14, 2019
34a83a4
Merge branch 'means' of https://github.com/jbrockmendel/pandas into m…
jbrockmendel May 14, 2019
14150ee
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel May 16, 2019
450c8ce
don't pass axis to methods
jorisvandenbossche May 16, 2019
3e31ca1
add returns to docstring
jorisvandenbossche May 16, 2019
111c345
Merge branch 'master' of https://github.com/pandas-dev/pandas into means
jbrockmendel Jun 3, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Other Enhancements
- ``Series.str`` has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
- :class:`DatetimeIndex` and :class:`TimedeltaIndex` now have a `mean` method (:issue:`24757`)
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
-

.. _whatsnew_0250.api_breaking:
Expand Down
44 changes: 44 additions & 0 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -1435,6 +1435,50 @@ def max(self, axis=None, skipna=True, *args, **kwargs):
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)

def mean(self, axis=None, skipna=True):
jreback marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to add an axis argument here?
I know we have already added in some places, but also not added in other places, so probably something we should discuss.

I would personally try to not add too much overhead in additional useless kwargs just for matching signatures, if they can be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this lets us hook into np.mean, but we should verify exactly which kwargs are necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's why we did it in the past (but for that we also need the additional kwargs). But, for EAs, we could also decide to implement the array function protocol to achieve this goal, which might obviate the need to add the additional kwargs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel i think its worthile here to allow calling np.mean

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don'e we also need out=? meaning we should just accept kwargs like we do for Series.mean

"""
Return the mean value of the Array or mean along an axis.

.. versionadded:: 0.25.0

jorisvandenbossche marked this conversation as resolved.
Show resolved Hide resolved
Parameters
----------
axis : None
Dummy parameter to match NumPy signature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is to match numpy's signature, then there are a bunch of other kwargs to add as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right and we do this elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would just add kwargs for this purpose; we don't add them explicity anywhere else

skipna : bool, default True
Whether to ignore any NaT elements

TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
See Also
--------
numpy.ndarray.mean
Series.mean : Return the mean value in a Series.
"""
if is_period_dtype(self):
# See discussion in GH#24757
raise NotImplementedError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NotImplementedError seems to indicate that we just not yet implemented it, while here we actually explicitly don't want to calculate it. Would a TypeError be more suitable?

"mean is not implemented for {cls} since the meaning may be "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"mean is not implemented for {cls} since the meaning may be "
"mean is not implemented for {cls} since the meaning is "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing here is that it isn't ambiguous for Tick-frequencies, so "may be" is more accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the unambiguous result of the mean of this?

In [12]: idx = pd.period_range('2012-01-01', periods=4, freq='1s')

In [13]: idx
Out[13]: 
PeriodIndex(['2012-01-01 00:00:00', '2012-01-01 00:00:01',
             '2012-01-01 00:00:02', '2012-01-01 00:00:03'],
            dtype='period[S]', freq='S')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period('2012-01-01 00:00:01', freq='S'). We already discussed rounding error in this thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mainly discussed that for datetime64, where I am fine with that.
But for Period, you implicitly used the start_times for taking the mean, while you could also have taken the end_times. So I still find this ambiguous.

(anyway, I think we agree to disallow it all-together for all Periods, so this is only about the error message)

"ambiguous. An alternative is "
"obj.to_timestamp(how='start').mean()"
.format(cls=type(self).__name__))

nv.validate_minmax_axis(axis)

mask = self.isna()
if skipna:
values = self[~mask]
elif mask.any():
return NaT
else:
values = self

if not len(values):
# short-circut for empty max / min
return NaT

result = nanops.nanmean(values.view('i8'), skipna=skipna)
# Don't have to worry about NA `result`, since no NA went in.
return self._box_func(result)


# -------------------------------------------------------------------
# Shared Constructor Helpers
Expand Down
1 change: 1 addition & 0 deletions pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ class DatetimeIndexOpsMixin(ExtensionOpsMixin):
_maybe_mask_results = ea_passthrough(
DatetimeLikeArrayMixin._maybe_mask_results)
__iter__ = ea_passthrough(DatetimeLikeArrayMixin.__iter__)
mean = ea_passthrough(DatetimeLikeArrayMixin.mean)

@property
def freq(self):
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3686,6 +3686,10 @@ def _reduce(self, op, name, axis=0, skipna=True, numeric_only=None,
elif is_datetime64_dtype(delegate):
# use DatetimeIndex implementation to handle skipna correctly
delegate = DatetimeIndex(delegate)
elif is_timedelta64_dtype(delegate) and hasattr(TimedeltaIndex, name):
jreback marked this conversation as resolved.
Show resolved Hide resolved
# use TimedeltaIndex to handle skipna correctly
# TODO: remove hasattr check after TimedeltaIndex has `std` method
delegate = TimedeltaIndex(delegate)

# dispatch to numpy arrays
elif isinstance(delegate, np.ndarray):
Expand Down
41 changes: 41 additions & 0 deletions pandas/tests/frame/test_analytics.py
Original file line number Diff line number Diff line change
Expand Up @@ -1194,6 +1194,47 @@ def test_mean_corner(self, float_frame, float_string_frame):
means = float_frame.mean(0)
assert means['bool'] == float_frame['bool'].values.mean()

def test_mean_datetimelike(self):
# GH#24757 check that datetimelike are excluded by default, handled
# correctly with numeric_only=True

df = pd.DataFrame({
'A': np.arange(3),
'B': pd.date_range('2016-01-01', periods=3),
'C': pd.timedelta_range('1D', periods=3),
'D': pd.period_range('2016', periods=3, freq='A')
})
result = df.mean(numeric_only=True)
expected = pd.Series({'A': 1.})
tm.assert_series_equal(result, expected)

result = df.mean()
expected = pd.Series({
'A': 1.,
'C': df.loc[1, 'C']
})
tm.assert_series_equal(result, expected)

@pytest.mark.xfail(reason="casts to object-dtype and then tries to "
"add timestamps",
raises=TypeError, strict=True)
def test_mean_datetimelike_numeric_only_false(self):
df = pd.DataFrame({
'A': np.arange(3),
'B': pd.date_range('2016-01-01', periods=3),
'C': pd.timedelta_range('1D', periods=3),
'D': pd.period_range('2016', periods=3, freq='A')
})

result = df.mean(numeric_only=False)
expected = pd.Series({
'A': 1,
'B': df.loc[1, 'B'],
'C': df.loc[1, 'C'],
'D': df.loc[1, 'D']
})
tm.assert_series_equal(result, expected)

def test_stats_mixed_type(self, float_string_frame):
# don't blow up
float_string_frame.std(1)
Expand Down
69 changes: 69 additions & 0 deletions pandas/tests/reductions/test_stat_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,78 @@

import pandas as pd
from pandas import DataFrame, Series, compat
from pandas.core.arrays import DatetimeArray, PeriodArray, TimedeltaArray
import pandas.util.testing as tm


class TestDatetimeLikeStatReductions(object):

@pytest.mark.parametrize('box', [Series, pd.Index, DatetimeArray])
def test_dt64_mean(self, tz_naive_fixture, box):
tz = tz_naive_fixture

dti = pd.date_range('2001-01-01', periods=11, tz=tz)
# shuffle so that we are not just working with monotone-increasing
dti = dti.take([4, 1, 3, 10, 9, 7, 8, 5, 0, 2, 6])
dtarr = dti._data

obj = box(dtarr)
assert obj.mean() == pd.Timestamp('2001-01-06', tz=tz)
assert obj.mean(skipna=False) == pd.Timestamp('2001-01-06', tz=tz)

# dtarr[-2] will be the first date 2001-01-1
dtarr[-2] = pd.NaT

obj = box(dtarr)
assert obj.mean() == pd.Timestamp('2001-01-06 07:12:00', tz=tz)
assert obj.mean(skipna=False) is pd.NaT

@pytest.mark.parametrize('box', [Series, pd.Index, PeriodArray])
def test_period_mean(self, box):
# GH#24757
dti = pd.date_range('2001-01-01', periods=11)
# shuffle so that we are not just working with monotone-increasing
dti = dti.take([4, 1, 3, 10, 9, 7, 8, 5, 0, 2, 6])

# use hourly frequency to avoid rounding errors in expected results
jreback marked this conversation as resolved.
Show resolved Hide resolved
# TODO: flesh this out with different frequencies
parr = dti._data.to_period('H')
obj = box(parr)
with pytest.raises(NotImplementedError, match="ambiguous"):
obj.mean()
with pytest.raises(NotImplementedError, match="ambiguous"):
obj.mean(skipna=True)

# parr[-2] will be the first date 2001-01-1
parr[-2] = pd.NaT

with pytest.raises(NotImplementedError, match="ambiguous"):
obj.mean()
with pytest.raises(NotImplementedError, match="ambiguous"):
obj.mean(skipna=True)

@pytest.mark.parametrize('box', [Series, pd.Index, TimedeltaArray])
def test_td64_mean(self, box):
tdi = pd.TimedeltaIndex([0, 3, -2, -7, 1, 2, -1, 3, 5, -2, 4],
unit='D')

tdarr = tdi._data
obj = box(tdarr)

result = obj.mean()
expected = np.array(tdarr).mean()
assert result == expected

tdarr[0] = pd.NaT
assert obj.mean(skipna=False) is pd.NaT

result2 = obj.mean(skipna=True)
assert result2 == tdi[1:].mean()

# exact equality fails by 1 nanosecond
assert result2.round('us') == (result * 11. / 10).round('us')


class TestSeriesStatReductions(object):
# Note: the name TestSeriesStatReductions indicates these tests
# were moved from a series-specific test file, _not_ that these tests are
Expand Down