Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement FloatingArray.round() #38866

Closed
wants to merge 10 commits into from

Conversation

arw2019
Copy link
Member

@arw2019 arw2019 commented Dec 31, 2020

@arw2019 arw2019 marked this pull request as draft December 31, 2020 21:43
@arw2019 arw2019 added Series Series data structure NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Dec 31, 2020
@@ -410,6 +410,14 @@ def max(self, *, skipna=True, **kwargs):
nv.validate_max((), kwargs)
return super()._reduce("max", skipna=skipna)

def round(self, decimals=0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should do this generically (and move to NumericArray), e.g. something like

def round(self, decimals=0):
     return self._apply(np.round, decimals=decimals)

where _apply handles the masking & recasting. I think we discussed similar in .to_numeric

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or event better

round = _apply_function(np.round, decimals=0, "nice doc-string here")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going with the second way (may as well) would _apply_function be method on NumericArray (versus a module level function)?

@simonjayhawkins simonjayhawkins added this to the 1.2.1 milestone Jan 1, 2021
@simonjayhawkins
Copy link
Member

milestoned as 1.2.1. ok to merge changes to experimental types in patch releases?

@jreback jreback modified the milestones: 1.2.1, 1.3 Jan 3, 2021
@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

these generally should not be backported as EAs (esp) Floating are still experimental

@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

milestoned as 1.2.1. ok to merge changes to experimental types in patch releases?

we can do on a case by case. the proposed fix here that i suggest is a bit more non-trivial.

@arw2019 arw2019 marked this pull request as ready for review January 4, 2021 22:49

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
values = self._data[~self._mask]
values = np.round(values, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be func :->

@@ -130,3 +158,16 @@ def _arith_method(self, other, op):
)

return self._maybe_mask_result(result, mask, other, op_name)

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
values = self._data[~self._mask]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string

@@ -130,3 +158,16 @@ def _arith_method(self, other, op):
)

return self._maybe_mask_result(result, mask, other, op_name)

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jbrockmendel @jorisvandenbossche

shall we make this more general? (e.g. on base.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would for this PR leave it here in order to do the minimal to actually implement the round(), and have a follow-up to discuss how we might want to use this more general, because indeed we probably want that)

@jorisvandenbossche
Copy link
Member

Since I think being able to round is a quite essential functionality for a floating dtypes that we missed, I think it makes sense to backport this, exactly because it is still experimental anyway.
(since it's experimental anyway, we don't need to be as careful to backport as for other parts of pandas, and by backporting this, it will make it easier for users to actually already experiment with the new dtype in pandas 1.2.x, and thus we can get more feedback)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arw2019 thanks for working on this!

I think this needs some more testing, and also for the actual array methods (I would maybe create a new test_function.py in the /tests/arrays/masked/ dir for this)

@@ -54,6 +54,7 @@ Other enhancements
- Add support for dict-like names in :class:`MultiIndex.set_names` and :class:`MultiIndex.rename` (:issue:`20421`)
- :func:`pandas.read_excel` can now auto detect .xlsb files (:issue:`35416`)
- :meth:`.Rolling.sum`, :meth:`.Expanding.sum`, :meth:`.Rolling.mean`, :meth:`.Expanding.mean`, :meth:`.Rolling.median`, :meth:`.Expanding.median`, :meth:`.Rolling.max`, :meth:`.Expanding.max`, :meth:`.Rolling.min`, and :meth:`.Expanding.min` now support ``Numba`` execution with the ``engine`` keyword (:issue:`38895`)
- Added :meth:`NumericArray.round` (:issue:`38844`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NumericArray is not public, and thus we shouldn't mention it in the whatsnew notes. You can say something about "round() being enabled for the nullable integer and floating dtypes"

data[~self._mask] = values
return type(self)(data, self._mask)

@doc(_round_doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring can be moved inline?

@@ -130,3 +158,16 @@ def _arith_method(self, other, op):
)

return self._maybe_mask_result(result, mask, other, op_name)

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I would for this PR leave it here in order to do the minimal to actually implement the round(), and have a follow-up to discuss how we might want to use this more general, because indeed we probably want that)

@@ -130,3 +158,16 @@ def _arith_method(self, other, op):
)

return self._maybe_mask_result(result, mask, other, op_name)

def _apply(self, func: Callable, **kwargs) -> "NumericArray":
values = self._data[~self._mask]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually need to subset the _data with the mask in this case, as "round" should work on all values, and I can't think of a case where it would error by being called on the "invalid" values hidden by the mask.

Of course, if many values are masked, we might be calculating round on too many values. But doing the filter operation / copy also takes time. Maybe something to time both ways.


data = np.zeros(self._data.shape)
data[~self._mask] = values
return type(self)(data, self._mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask needs to be copied I think? (result should not share a mask with the original array, because otherwise editing one can modify the other. We should probably also test this)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point (and actually the same bug exists already in my implementation of to_numeric for EAs - #38974). I'll fix this and add tests


@doc(_round_doc)
def round(self, decimals: int = 0, *args, **kwargs) -> "NumericArray":
nv.validate_round(args, kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we accept args/kwargs here and validate them, then we should also test this (eg doing np.round(float_arr) triggers this)

@jreback
Copy link
Contributor

jreback commented Jan 6, 2021

Since I think being able to round is a quite essential functionality for a floating dtypes that we missed, I think it makes sense to backport this, exactly because it is still experimental anyway.
(since it's experimental anyway, we don't need to be as careful to backport as for other parts of pandas, and by backporting this, it will make it easier for users to actually already experiment with the new dtype in pandas 1.2.x, and thus we can get more feedback)

I don't think we should be backporting things like this.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2021

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 6, 2021
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

closing as stale. I think we need the general soln outlined above. Happy to re-open / take new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays Series Series data structure Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas.Series.round not yet implemented for FloatingArray
4 participants