API: BooleanArray any/all with NA logic #30062

jorisvandenbossche · 2019-12-04T19:53:01Z

Implementation and tests for any/all with the updated logic as discussed in the linked issue.

TomAugspurger · 2019-12-04T20:29:48Z

pandas/core/arrays/boolean.py

@@ -557,6 +557,30 @@ def _values_for_argsort(self) -> np.ndarray:
        data[self._mask] = -1
        return data

+    def any(self, skipna=True):


What happens with np.any with this? Do we need any keywords for compatibility?

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

jorisvandenbossche · 2019-12-04T20:52:59Z

What happens with np.any with this? Do we need any keywords for compatibility?

Yes, still need to do that. If we want this to work (without getting into __array_function__ for now), we need to add at least axis and out.

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

Ah, didn't look yet at those. They actually accept a mask. The approach they take is to fill the missing values with a fill_value (instead of filtering as I did here).
But, we would still need the custom logic to decide when something should return pd.NA or not, so not fully sure it is worth to reuse those (will also do some timings tomorrow).

Also still need to add docstrings.

jorisvandenbossche · 2019-12-09T17:38:37Z

Is the expected behavior here different from nanops.nanany? / nanops.nanall?

So I didn't use those methods, because indeed the behaviour that is now implemented in nanany/nanall for the skipna=False case is different.

TomAugspurger

Implementing these directly on BooleanArray makes sense to me.

jreback

can you add this issue number in the whatsnew where BooleanArray was added

jreback · 2019-12-10T13:06:53Z

pandas/core/arrays/boolean.py

@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray:
        data[self._mask] = -1
        return data

+    def any(self, skipna=True, **kwargs):


can you type

jreback · 2019-12-10T13:07:43Z

pandas/core/arrays/boolean.py

+        if skipna:
+            return result
+        else:
+            if result or len(self) == 0:


use not len(self)

In pandas/core, we actually use the len(..) == 0 pattern more than not len(..). I personally also find that easier to read.

(the typical pythonic idiom recommendation is about doing if (not) container: instead of if (not) len(container) for empty containers, but that of course doesn't hold for arrays)

jreback · 2019-12-10T13:07:54Z

pandas/core/arrays/boolean.py

+            else:
+                return self.dtype.na_value
+
+    def all(self, skipna=True, **kwargs):


jreback · 2019-12-10T13:08:10Z

pandas/core/arrays/boolean.py

+
+        See Also
+        --------
+        numpy.all : Numpy version of this method.


might want to add a link for kleene logic here

In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.

jreback · 2019-12-10T13:08:17Z

pandas/core/arrays/boolean.py

+        if skipna:
+            return result
+        else:
+            if not result or len(self) == 0:


same as above

jreback · 2019-12-10T13:09:04Z

pandas/core/arrays/boolean.py

@@ -656,6 +794,10 @@ def cmp_method(self, other):
        return set_function_name(cmp_method, name, cls)

    def _reduce(self, name, skipna=True, **kwargs):
+
+        if name in {"any", "all"}:


we usually use lists for these checks

In this file we actually use more in {} than in [] (both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)

Heh, I'm probably to blame for the sets :) I like them more for membership tests, though it doesn't matter for small sets.

jorisvandenbossche

Thanks for the review!

jorisvandenbossche · 2019-12-10T13:22:36Z

pandas/core/arrays/boolean.py

+
+        See Also
+        --------
+        numpy.all : Numpy version of this method.


In the See Also section, we can only add links to other API pages. But, in the long description of the docstring a bit above, I already included a link about the Kleene logic.

jorisvandenbossche · 2019-12-10T13:22:47Z

pandas/core/arrays/boolean.py

@@ -560,6 +561,143 @@ def _values_for_argsort(self) -> np.ndarray:
        data[self._mask] = -1
        return data

+    def any(self, skipna=True, **kwargs):


jorisvandenbossche · 2019-12-10T13:31:46Z

pandas/core/arrays/boolean.py

+        if skipna:
+            return result
+        else:
+            if result or len(self) == 0:


In pandas/core, we actually use the len(..) == 0 pattern more than not len(..). I personally also find that easier to read.

(the typical pythonic idiom recommendation is about doing if (not) container: instead of if (not) len(container) for empty containers, but that of course doesn't hold for arrays)

jorisvandenbossche · 2019-12-10T13:32:36Z

pandas/core/arrays/boolean.py

@@ -656,6 +794,10 @@ def cmp_method(self, other):
        return set_function_name(cmp_method, name, cls)

    def _reduce(self, name, skipna=True, **kwargs):
+
+        if name in {"any", "all"}:


In this file we actually use more in {} than in [] (both are used), but since Tom and I wrote this file, that's probably not an argument ;)
Happy to change it, purely performance wise the set is faster (but this is about nanoseconds of course ..)

jorisvandenbossche · 2019-12-10T13:32:57Z

pandas/core/arrays/boolean.py

+            else:
+                return self.dtype.na_value
+
+    def all(self, skipna=True, **kwargs):


jorisvandenbossche · 2019-12-11T09:27:26Z

This is good to go?

The failure on Azure is the flaky resource warning thing.

jorisvandenbossche added 3 commits December 4, 2019 20:51

API: BooleanArray any/all with NA logic

0bf654e

use in Series implementation

043f257

clean-up numpy scalars

12d2729

This was referenced Dec 4, 2019

Use new NA scalar in BooleanArray #29961

Merged

Missing values proposal: concrete steps for 1.0 #29556

Closed

TomAugspurger reviewed Dec 4, 2019

View reviewed changes

handle numpy compat

15471d8

jorisvandenbossche added this to the 1.0 milestone Dec 6, 2019

jreback added the API Design label Dec 8, 2019

jorisvandenbossche added 3 commits December 9, 2019 10:45

Merge remote-tracking branch 'upstream/master' into EA-bool-any-all

6ca6945

more efficient implementation with copy + putmask instead of filter

e59e91f

add docstrings

24797d4

TomAugspurger approved these changes Dec 9, 2019

View reviewed changes

jreback requested changes Dec 10, 2019

View reviewed changes

jorisvandenbossche commented Dec 10, 2019

View reviewed changes

type

ec7d072

jorisvandenbossche merged commit cceef8e into pandas-dev:master Dec 12, 2019

jorisvandenbossche deleted the EA-bool-any-all branch December 12, 2019 13:24

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

API: BooleanArray any/all with NA logic (pandas-dev#30062)

1e7a3fc

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

API: BooleanArray any/all with NA logic (pandas-dev#30062)

166b3e0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: BooleanArray any/all with NA logic #30062

API: BooleanArray any/all with NA logic #30062

jorisvandenbossche commented Dec 4, 2019 •

edited

Loading

TomAugspurger Dec 4, 2019

jorisvandenbossche commented Dec 4, 2019

jorisvandenbossche commented Dec 9, 2019

TomAugspurger left a comment

jreback left a comment

jreback Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jreback Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jreback Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jreback Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jreback Dec 10, 2019

jreback Dec 10, 2019

jorisvandenbossche Dec 10, 2019

TomAugspurger Dec 10, 2019

jorisvandenbossche left a comment

jorisvandenbossche Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jorisvandenbossche Dec 10, 2019

jorisvandenbossche commented Dec 11, 2019

API: BooleanArray any/all with NA logic #30062

API: BooleanArray any/all with NA logic #30062

Conversation

jorisvandenbossche commented Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 4, 2019

jorisvandenbossche commented Dec 9, 2019

TomAugspurger left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 11, 2019

jorisvandenbossche commented Dec 4, 2019 •

edited

Loading