Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement FloatingArray.round() #38866

Closed
wants to merge 10 commits into from
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Other enhancements
- Add support for dict-like names in :class:`MultiIndex.set_names` and :class:`MultiIndex.rename` (:issue:`20421`)
- :func:`pandas.read_excel` can now auto detect .xlsb files (:issue:`35416`)
- :meth:`.Rolling.sum`, :meth:`.Expanding.sum`, :meth:`.Rolling.mean`, :meth:`.Expanding.mean`, :meth:`.Rolling.median`, :meth:`.Expanding.median`, :meth:`.Rolling.max`, :meth:`.Expanding.max`, :meth:`.Rolling.min`, and :meth:`.Expanding.min` now support ``Numba`` execution with the ``engine`` keyword (:issue:`38895`)
- Added :meth:`NumericArray.round` (:issue:`38844`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumericArray is not public, and thus we shouldn't mention it in the whatsnew notes. You can say something about "round() being enabled for the nullable integer and floating dtypes"


.. ---------------------------------------------------------------------------

Expand Down
43 changes: 42 additions & 1 deletion pandas/core/arrays/numeric.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
import datetime
from typing import TYPE_CHECKING, Union
from typing import TYPE_CHECKING, Callable, Union

import numpy as np

from pandas._libs import Timedelta, missing as libmissing
from pandas.compat.numpy import function as nv
from pandas.errors import AbstractMethodError
from pandas.util._decorators import doc

from pandas.core.dtypes.common import (
is_float,
Expand Down Expand Up @@ -56,6 +58,32 @@ def __from_arrow__(
return array_class._concat_same_type(results)


_round_doc = """
Round each value in NumericArray a to the given number of decimals.

Parameters
----------
decimals : int, default 0
Number of decimal places to round to. If decimals is negative,
it specifies the number of positions to the left of the decimal point.
*args, **kwargs
Additional arguments and keywords have no effect but might be
accepted for compatibility with NumPy.

Returns
-------
NumericArray
Rounded values of the NumericArray.

See Also
--------
numpy.around : Round values of an np.array.
DataFrame.round : Round values of a DataFrame.
Series.round : Round values of a Series.

"""


class NumericArray(BaseMaskedArray):
"""
Base class for IntegerArray and FloatingArray.
Expand Down Expand Up @@ -130,3 +158,16 @@ def _arith_method(self, other, op):
)

return self._maybe_mask_result(result, mask, other, op_name)

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel @jorisvandenbossche

shall we make this more general? (e.g. on base.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would for this PR leave it here in order to do the minimal to actually implement the round(), and have a follow-up to discuss how we might want to use this more general, because indeed we probably want that)

values = self._data[~self._mask]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually need to subset the _data with the mask in this case, as "round" should work on all values, and I can't think of a case where it would error by being called on the "invalid" values hidden by the mask.

Of course, if many values are masked, we might be calculating round on too many values. But doing the filter operation / copy also takes time. Maybe something to time both ways.

values = np.round(values, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be func :->


data = np.zeros(self._data.shape)
data[~self._mask] = values
return type(self)(data, self._mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask needs to be copied I think? (result should not share a mask with the original array, because otherwise editing one can modify the other. We should probably also test this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point (and actually the same bug exists already in my implementation of to_numeric for EAs - #38974). I'll fix this and add tests


@doc(_round_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring can be moved inline?

def round(self, decimals: int = 0, *args, **kwargs) -> "NumericArray":
nv.validate_round(args, kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept args/kwargs here and validate them, then we should also test this (eg doing np.round(float_arr) triggers this)

return self._apply(np.round, decimals=decimals, **kwargs)
20 changes: 14 additions & 6 deletions pandas/tests/series/methods/test_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,23 @@ def test_round_numpy_with_nan(self):
expected = Series([2.0, np.nan, 0.0])
tm.assert_series_equal(result, expected)

def test_round_builtin(self):
ser = Series([1.123, 2.123, 3.123], index=range(3))
result = round(ser)
expected_rounded0 = Series([1.0, 2.0, 3.0], index=range(3))
def test_round_builtin(self, any_float_allowed_nullable_dtype):
ser = Series(
[1.123, 2.123, 3.123],
index=range(3),
dtype=any_float_allowed_nullable_dtype,
)
result = round(ser).astype(any_float_allowed_nullable_dtype)
expected_rounded0 = Series(
[1.0, 2.0, 3.0], index=range(3), dtype=any_float_allowed_nullable_dtype
)
tm.assert_series_equal(result, expected_rounded0)

decimals = 2
expected_rounded = Series([1.12, 2.12, 3.12], index=range(3))
result = round(ser, decimals)
expected_rounded = Series(
[1.12, 2.12, 3.12], index=range(3), dtype=any_float_allowed_nullable_dtype
)
result = round(ser, decimals).astype(any_float_allowed_nullable_dtype)
tm.assert_series_equal(result, expected_rounded)

@pytest.mark.parametrize("method", ["round", "floor", "ceil"])
Expand Down