-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Use _get_numeric_types
in numeric reductions
#2067
Comments
So to be fully explicit we would want changes like the following: def mean(self, *kwargs):
- return self._apply_support_method('mean', **kwargs)
+ return self._get_numeric_data()._apply_support_method('mean', **kwargs) Along with tests that this works when we have a small dataframe with both numeric and text dtypes. |
I believe the behavior requested in this issue is now considered deprecated by pandas based on the following example: import cudf
df = cudf.datasets.timeseries()
pdf = df.to_pandas()
pdf.mean()
/tmp/ipykernel_47228/2065928329.py:4: FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
pdf.mean()
id 999.979244
x 0.000081
y 0.000099
dtype: float64
df.mean()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_47228/3698961737.py in <module>
----> 1 df.mean()
~/conda/envs/rapids-22.02/lib/python3.8/site-packages/cudf/core/frame.py in mean(self, axis, skipna, level, numeric_only, **kwargs)
4304 dtype: float64
4305 """
-> 4306 return self._reduce(
4307 "mean",
4308 axis=axis,
~/conda/envs/rapids-22.02/lib/python3.8/site-packages/cudf/core/dataframe.py in _reduce(self, op, axis, level, numeric_only, **kwargs)
5326
5327 if axis == 0:
-> 5328 result = [
5329 getattr(self._data[col], op)(**kwargs)
5330 for col in self._data.names
~/conda/envs/rapids-22.02/lib/python3.8/site-packages/cudf/core/dataframe.py in <listcomp>(.0)
5327 if axis == 0:
5328 result = [
-> 5329 getattr(self._data[col], op)(**kwargs)
5330 for col in self._data.names
5331 ]
~/conda/envs/rapids-22.02/lib/python3.8/site-packages/cudf/core/column/column.py in mean(self, skipna, dtype)
1178
1179 def mean(self, skipna: bool = None, dtype: Dtype = None):
-> 1180 raise TypeError(f"cannot perform mean with type {self.dtype}")
1181
1182 def std(self, skipna: bool = None, ddof=1, dtype: Dtype = np.float64):
TypeError: cannot perform mean with type category This is consistent with pandas-dev/pandas#41480 . Rather than match deprecated functionality, I think our current behavior of throwing a TypeError should be considered correct. Separately, we have a |
Rather than file a new issue, I'll add context here. In Python, we provide passthrough support for the Implementing support for |
Add support for numeric_only in DataFrame._reduce, this way can use df.mean(numeric_only=True), etc. Resolves #2067. Also partially addresses #9009. Authors: - https://github.com/martinfalisse Approvers: - Michael Wang (https://github.com/isVoid) - Vyas Ramasubramani (https://github.com/vyasr) URL: #10629
If would be great if the following worked:
Currently we get this
Typically the solution in Pandas is to strip out the non-numeric columns first
We should consider adding this method call in all dataframe reductions that expect numeric data like sum, mean, prod, std, var, cum* and so on.
The text was updated successfully, but these errors were encountered: