DataFrame.mean() ignores datetime series #28108

BlaneG · 2019-08-23T03:16:57Z

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
In [2]: from datetime import datetime

In [3]: s = pd.Series([datetime(2014, 7, 9), 
   ...:            datetime(2014, 7, 10), 
   ...:            datetime(2014, 7, 11)])

In [4]: df = pd.DataFrame({'numeric':[1,2,3],
   ...:               'datetime':s})

In [5]: df.mean()
Out[5]: 
numeric    2.0
dtype: float64

In [6]: s.mean()
Out[6]: Timestamp('2014-07-10 00:00:00')

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

As of pandas 0.25 it is possible to apply mean() to a datetime series. However, DataFrame.mean() ignores datetimes series columns rather than returning the mean of the datetime series as one might expect.

Expected Output

When axis=0, output could be a dataframe with dtype as the first row, and the value as the second row.

TomAugspurger · 2019-08-23T14:11:29Z

Can you check out the PR where this was implemented? I think there was some discussion about what to do with DataFrame.

TomAugspurger · 2019-08-23T20:56:23Z

#24757 (comment)

So it seems like you'll need numeric_only=False. However, that still fails, as numeric_only does bad things

In [38]: df.mean(numeric_only=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'

In [39]: df[['datetime']].mean(numeric_only=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-74680e367368> in <module>
----> 1 df[['datetime']].mean(numeric_only=False)

~/sandbox/pandas/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
  11577             return self._agg_by_level(name, axis=axis, level=level, skipna=skipna)
  11578         return self._reduce(
> 11579             f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
  11580         )
  11581

~/sandbox/pandas/pandas/core/frame.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   7905             else:
   7906                 values = self.values
-> 7907             result = f(values)
   7908
   7909         if hasattr(result, "dtype") and is_object_dtype(result.dtype):

~/sandbox/pandas/pandas/core/frame.py in f(x)
   7826
   7827         def f(x):
-> 7828             return op(x, axis=axis, skipna=skipna, **kwds)
   7829
   7830         # exclude timedelta/datetime unless we are uniform types

~/sandbox/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     63             if any(self.check(obj) for obj in obj_iter):
     64                 msg = "reduction operation {name!r} not allowed for this dtype"
---> 65                 raise TypeError(msg.format(name=f.__name__.replace("nan", "")))
     66             try:
     67                 with np.errstate(invalid="ignore"):

TypeError: reduction operation 'mean' not allowed for this dtype

both end up doing df.values, rather than applying the reduction per block.

jbrockmendel · 2019-08-23T21:09:27Z

IIRC we had a PR that fixed nanmean for datetime64 but we reverted it because it changed the behavior of df.mean. I think we we're going to do a deprecation cycle

jbrockmendel · 2020-09-21T22:10:30Z

Closed by #29941

jbrockmendel self-assigned this Oct 16, 2019

jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 16, 2019

TomAugspurger mentioned this issue Jan 21, 2020

Mean implementation for datetime series dask/dask#5794

Open

2 tasks

jbrockmendel added Reduction Operations sum, mean, min, max, etc. Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Sep 21, 2020

jbrockmendel closed this as completed Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame.mean() ignores datetime series #28108

DataFrame.mean() ignores datetime series #28108

BlaneG commented Aug 23, 2019 •

edited by jbrockmendel

Loading

TomAugspurger commented Aug 23, 2019

TomAugspurger commented Aug 23, 2019

jbrockmendel commented Aug 23, 2019

jbrockmendel commented Sep 21, 2020

DataFrame.mean() ignores datetime series #28108

DataFrame.mean() ignores datetime series #28108

Comments

BlaneG commented Aug 23, 2019 • edited by jbrockmendel Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

TomAugspurger commented Aug 23, 2019

TomAugspurger commented Aug 23, 2019

jbrockmendel commented Aug 23, 2019

jbrockmendel commented Sep 21, 2020

BlaneG commented Aug 23, 2019 •

edited by jbrockmendel

Loading