Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Implement rounding for floating dtype array #38844 #39751

Merged
merged 1 commit into from
Mar 8, 2021
Merged

ENH: Implement rounding for floating dtype array #38844 #39751

merged 1 commit into from
Mar 8, 2021

Conversation

benoit9126
Copy link
Contributor

Code proposed in the staled PR #38866

@jreback jreback added Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. labels Feb 11, 2021
@jreback jreback added this to the 1.3 milestone Feb 11, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm minor comment

pandas/core/arrays/numeric.py Show resolved Hide resolved
@@ -188,3 +189,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @jorisvandenbossche @jbrockmendel do we want specific EA tests?

Copy link
Member

@jorisvandenbossche jorisvandenbossche Feb 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for general EA, but for the numeric arrays specifically, yes. As mentioned on the other PR, I would create a new test_function.py in the /tests/arrays/masked/ dir for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche I can add a file in the /tests/arrays/masked/folder in order to test the consistency between the numpy round and the pandas round for all the following FLOAT_EA_DTYPES types. Nevertheless, I now use the any_float_allowed_nullable_dtype fixture in the test test_round_numpy_with_nan of tests/series/methods/test_round.py.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add general EA tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a file in the /tests/arrays/masked/ folder in order to test the consistency between the numpy round and the pandas round for all the following FLOAT_EA_DTYPES types. Nevertheless, I now use the any_float_allowed_nullable_dtype fixture in the test test_round_numpy_with_nan of tests/series/methods/test_round.py.

I would still add a small test in /tests/arrays/masked/ as well. Since this is a public method on the integer and floating arrays, we should test that specifically as well (and not only through Series)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should also test calling numpy.round on the array, see #38866 (comment)

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up!

See my review at the other PR at #38866 (review), there are still a few outstanding comments I think

index=range(3),
dtype=any_float_allowed_nullable_dtype,
)
result = round(ser).astype(any_float_allowed_nullable_dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the .astype call added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is not required.

@jreback
Copy link
Contributor

jreback commented Feb 21, 2021

can you merge master

The image of the NumericArray
"""
values = func(self._data, **kwargs)
return type(self)(values, self._mask.copy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if func returns a view or is a no-op, would we need to the the same for the mask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was related to a comment of @jorisvandenbossche in the original PR proposal #38866

The mask needs to be copied I think? (result should not share a mask with the original array, because otherwise editing one can modify the other. We should probably also test this)

Here, I think this method should only be used with functions which return a copy (which is the case of most numpy functions I know --- without the out argument). It may need to be précised in the docstring for future developpers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, see comment above as well. Let's not generalize this prematurely, just knowing (and documenting) the func returns a new object is fine for now IMO.

@@ -199,3 +201,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add type parameters to Callable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace it with Callable[[np.ndarray,...],np.ndarray]. Is it precise enough for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable[[np.ndarray],np.ndarray] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to add the ... for the **kwargs given to the func call.

@@ -199,3 +201,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the return type is the same as self?

Copy link
Contributor Author

@benoit9126 benoit9126 Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I return return type(self)(values, self._mask.copy()) so, yes, it is a NumericArray. Do you want me to be more precise using a TypeVar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the func signature and the subclass constructors. If a func say changes dtype and the subclass constructor accepts the changed dtype. Is that what we want from apply? or would we want say an FloatArray if we apply a func with a signature Callable[[np.ndarray[int],...],np.ndarray[float]] to say a IntegerArray?

we may want to be more precise and restrictive on the callable and say that the func must return the same type (for now) and as a follow-up implement a more generic _apply function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on the previous PR, I would keep it focused here on just getting it working for round. It's true that if we want such a generic _apply for several cases, all those questions come up (return type of the function, does the function return a view or new object, etc). But let's deal with those issues when they actually come up.

Personally I think we can also just put those 2 lines of code in round() itself for now, but documenting the current restrictions of _apply (func needs to return new object of same dtype) is also fine for me.

values = func(self._data, **kwargs)
return type(self)(values, self._mask.copy())

def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return type is same as self?

values = func(self._data, **kwargs)
return type(self)(values, self._mask.copy())

def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is *args needed? having *args after a keyword argument can be problematic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the same as already done in some other files of the library. For instance here

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!


def round(self, decimals: int = 0, *args, **kwargs) -> NumericArray:
"""
Round each value in NumericArray a to the given number of decimals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Round each value in NumericArray a to the given number of decimals.
Round each value in the array to the given number of decimals.

(NumericArray is not really public, so would try to avoid it in the docstring where it's not really needed)

@@ -188,3 +189,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a file in the /tests/arrays/masked/ folder in order to test the consistency between the numpy round and the pandas round for all the following FLOAT_EA_DTYPES types. Nevertheless, I now use the any_float_allowed_nullable_dtype fixture in the test test_round_numpy_with_nan of tests/series/methods/test_round.py.

I would still add a small test in /tests/arrays/masked/ as well. Since this is a public method on the integer and floating arrays, we should test that specifically as well (and not only through Series)

@@ -199,3 +201,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned on the previous PR, I would keep it focused here on just getting it working for round. It's true that if we want such a generic _apply for several cases, all those questions come up (return type of the function, does the function return a view or new object, etc). But let's deal with those issues when they actually come up.

Personally I think we can also just put those 2 lines of code in round() itself for now, but documenting the current restrictions of _apply (func needs to return new object of same dtype) is also fine for me.

@@ -188,3 +189,48 @@ def reconstruct(x):
return tuple(reconstruct(x) for x in result)
else:
return reconstruct(result)

def _apply(self, func: Callable, **kwargs) -> NumericArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should also test calling numpy.round on the array, see #38866 (comment)

The image of the NumericArray
"""
values = func(self._data, **kwargs)
return type(self)(values, self._mask.copy())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, see comment above as well. Let's not generalize this prematurely, just knowing (and documenting) the func returns a new object is fine for now IMO.

@benoit9126
Copy link
Contributor Author

Thanks for all your reviews! I will take time to implement them, but I can't before Saturday.

@benoit9126
Copy link
Contributor Author

benoit9126 commented Mar 1, 2021

I pushed some modifications:

  • I added a small test_function.py in tests/arrays/masked/
  • I merged the _apply and the round methods to have only the round method
  • I merged master

@jreback
Copy link
Contributor

jreback commented Mar 1, 2021

some precommit issues

Performing static analysis using mypy
pandas/core/arrays/numeric.py:241: error: "T" has no attribute "_data"  [attr-defined]
pandas/core/arrays/numeric.py:242: error: Too many arguments for "object"  [call-arg]
pandas/core/arrays/numeric.py:242: error: "T" has no attribute "_mask"  [attr-defined]
Found 3 errors in 1 file (checked 1242 source files)

@benoit9126
Copy link
Contributor Author

@jreback two tests failed in code that I did not write/modify? Is this normal?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jorisvandenbossche if any comments.

@benoit9126 the failures are know on that build currently.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2021

thanks for merging master @benoit9126 this lgtm.

@jreback jreback merged commit ee2b8b5 into pandas-dev:master Mar 8, 2021
@jreback
Copy link
Contributor

jreback commented Mar 8, 2021

thanks @benoit9126

@benoit9126 benoit9126 deleted the floating_round_gh38844 branch March 8, 2021 15:34
@jorisvandenbossche
Copy link
Member

Thanks a lot @benoit9126 !

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request Mar 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas.Series.round not yet implemented for FloatingArray
4 participants