Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.mean() ignores datetime series #28108

Closed
BlaneG opened this issue Aug 23, 2019 · 4 comments
Closed

DataFrame.mean() ignores datetime series #28108

BlaneG opened this issue Aug 23, 2019 · 4 comments
Assignees
Labels
Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc.

Comments

@BlaneG
Copy link

BlaneG commented Aug 23, 2019

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd
In [2]: from datetime import datetime

In [3]: s = pd.Series([datetime(2014, 7, 9), 
   ...:            datetime(2014, 7, 10), 
   ...:            datetime(2014, 7, 11)])

In [4]: df = pd.DataFrame({'numeric':[1,2,3],
   ...:               'datetime':s})

In [5]: df.mean()
Out[5]: 
numeric    2.0
dtype: float64

In [6]: s.mean()
Out[6]: Timestamp('2014-07-10 00:00:00')

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution.]

As of pandas 0.25 it is possible to apply mean() to a datetime series. However, DataFrame.mean() ignores datetimes series columns rather than returning the mean of the datetime series as one might expect.

Expected Output

When axis=0, output could be a dataframe with dtype as the first row, and the value as the second row.

@TomAugspurger
Copy link
Contributor

Can you check out the PR where this was implemented? I think there was some discussion about what to do with DataFrame.

@TomAugspurger
Copy link
Contributor

#24757 (comment)

So it seems like you'll need numeric_only=False. However, that still fails, as numeric_only does bad things

In [38]: df.mean(numeric_only=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
...
TypeError: unsupported operand type(s) for +: 'Timestamp' and 'Timestamp'
In [39]: df[['datetime']].mean(numeric_only=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-74680e367368> in <module>
----> 1 df[['datetime']].mean(numeric_only=False)

~/sandbox/pandas/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
  11577             return self._agg_by_level(name, axis=axis, level=level, skipna=skipna)
  11578         return self._reduce(
> 11579             f, name, axis=axis, skipna=skipna, numeric_only=numeric_only
  11580         )
  11581

~/sandbox/pandas/pandas/core/frame.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   7905             else:
   7906                 values = self.values
-> 7907             result = f(values)
   7908
   7909         if hasattr(result, "dtype") and is_object_dtype(result.dtype):

~/sandbox/pandas/pandas/core/frame.py in f(x)
   7826
   7827         def f(x):
-> 7828             return op(x, axis=axis, skipna=skipna, **kwds)
   7829
   7830         # exclude timedelta/datetime unless we are uniform types

~/sandbox/pandas/pandas/core/nanops.py in _f(*args, **kwargs)
     63             if any(self.check(obj) for obj in obj_iter):
     64                 msg = "reduction operation {name!r} not allowed for this dtype"
---> 65                 raise TypeError(msg.format(name=f.__name__.replace("nan", "")))
     66             try:
     67                 with np.errstate(invalid="ignore"):

TypeError: reduction operation 'mean' not allowed for this dtype

both end up doing df.values, rather than applying the reduction per block.

@jbrockmendel
Copy link
Member

IIRC we had a PR that fixed nanmean for datetime64 but we reverted it because it changed the behavior of df.mean. I think we we're going to do a deprecation cycle

@jbrockmendel jbrockmendel self-assigned this Oct 16, 2019
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Oct 16, 2019
@jbrockmendel jbrockmendel added Reduction Operations sum, mean, min, max, etc. Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Sep 21, 2020
@jbrockmendel
Copy link
Member

Closed by #29941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

No branches or pull requests

3 participants