Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: BooleanArray any/all with NA logic #30062

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Dec 4, 2019

Closes #29686

Implementation and tests for any/all with the updated logic as discussed in the linked issue.

@@ -557,6 +557,30 @@ def _values_for_argsort(self) -> np.ndarray:
data[self._mask] = -1
return data

def any(self, skipna=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with np.any with this? Do we need any keywords for compatibility?

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

@jorisvandenbossche
Copy link
Member Author

What happens with np.any with this? Do we need any keywords for compatibility?

Yes, still need to do that. If we want this to work (without getting into __array_function__ for now), we need to add at least axis and out.

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

Ah, didn't look yet at those. They actually accept a mask. The approach they take is to fill the missing values with a fill_value (instead of filtering as I did here).
But, we would still need the custom logic to decide when something should return pd.NA or not, so not fully sure it is worth to reuse those (will also do some timings tomorrow).

Also still need to add docstrings.

@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Dec 6, 2019
@jorisvandenbossche
Copy link
Member Author

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

So I didn't use those methods, because indeed the behaviour that is now implemented in nanany/nanall for the skipna=False case is different.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing these directly on BooleanArray makes sense to me.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this issue number in the whatsnew where BooleanArray was added

@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray:
data[self._mask] = -1
return data

def any(self, skipna=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if skipna:
return result
else:
if result or len(self) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use not len(self)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas/core, we actually use the len(..) == 0 pattern more than not len(..). I personally also find that easier to read.

(the typical pythonic idiom recommendation is about doing if (not) container: instead of if (not) len(container) for empty containers, but that of course doesn't hold for arrays)

else:
return self.dtype.na_value

def all(self, skipna=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


See Also
--------
numpy.all : Numpy version of this method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to add a link for kleene logic here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.

if skipna:
return result
else:
if not result or len(self) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -656,6 +794,10 @@ def cmp_method(self, other):
return set_function_name(cmp_method, name, cls)

def _reduce(self, name, skipna=True, **kwargs):

if name in {"any", "all"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use lists for these checks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file we actually use more in {} than in [] (both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I'm probably to blame for the sets :) I like them more for membership tests, though it doesn't matter for small sets.

Copy link
Member Author

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!


See Also
--------
numpy.all : Numpy version of this method.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.

@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray:
data[self._mask] = -1
return data

def any(self, skipna=True, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if skipna:
return result
else:
if result or len(self) == 0:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas/core, we actually use the len(..) == 0 pattern more than not len(..). I personally also find that easier to read.

(the typical pythonic idiom recommendation is about doing if (not) container: instead of if (not) len(container) for empty containers, but that of course doesn't hold for arrays)

@@ -656,6 +794,10 @@ def cmp_method(self, other):
return set_function_name(cmp_method, name, cls)

def _reduce(self, name, skipna=True, **kwargs):

if name in {"any", "all"}:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this file we actually use more in {} than in [] (both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)

else:
return self.dtype.na_value

def all(self, skipna=True, **kwargs):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jorisvandenbossche
Copy link
Member Author

This is good to go?

The failure on Azure is the flaky resource warning thing.

@jorisvandenbossche jorisvandenbossche merged commit cceef8e into pandas-dev:master Dec 12, 2019
@jorisvandenbossche jorisvandenbossche deleted the EA-bool-any-all branch December 12, 2019 13:24
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API: any/all in context of boolean dtype with missing values
3 participants