Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Array.__setitem__ failing with nullable boolean mask #31484

Merged
merged 20 commits into from
Feb 1, 2020

Conversation

charlesdong1991
Copy link
Member

@charlesdong1991 charlesdong1991 commented Jan 31, 2020

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good

@@ -70,6 +70,7 @@ Indexing

-
-
- Bug in ``__setitem__`` in ``Array`` which fails with nullable boolean mask (:issue:`31446`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in ``__setitem__`` in ``Array`` which fails with nullable boolean mask (:issue:`31446`)
- Bug where assigning to a Series using a IntegerArray / BooleanArray as a mask would raise ``TypeError`` (:issue:`31446`)

@WillAyd WillAyd added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Jan 31, 2020
@WillAyd WillAyd added this to the 1.0.1 milestone Jan 31, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments.
Those are not strictly needed to fix the regression (only nullable integer already existed before), but I think it could be good to fix this more generally

pandas/core/arrays/integer.py Show resolved Hide resolved
pandas/tests/arrays/test_integer.py Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor

@charlesdong1991 I haven't looked closely, but you may need to skip that setitem test for PandasArray. There are some issues with setting sized values in an ndarray.

@charlesdong1991
Copy link
Member Author

charlesdong1991 commented Jan 31, 2020

ahh, thanks a lot for letting me know!! @TomAugspurger

i am indeed struggling with one failed test which is caused by PandasArray
seeing the error message, looks like original array is pd.array([0, 0, 0, (3,), (4,)]), while somehow expected became: pd.array([(0,), (0,), (0,), (3,), (4,)]).

I will skip this for now, and pls let me know if you think a new issue needs to be created or not.

just FYI, the only failed test


self = <pandas.tests.extension.test_numpy.TestSetitem object at 0x14d3eea20>
data = <PandasArray>
[    0,     0,     0,  (3,),  (4,),  (5,),  (6,),  (7,),  (8,),  (9,), (10,),
 (11,), (12,), (13,), (14,...(87,),
 (88,), (89,), (90,), (91,), (92,), (93,), (94,), (95,), (96,), (97,), (98,),
 (99,)]
Length: 100, dtype: object

    def test_setitem_nullable_mask(self, data):
        # GH 31446
        arr = data[:5]
        expected = data.take([0, 0, 0, 3, 4])
        mask = pd.array([True, True, True, False, False])
        arr[mask] = data[0]
>       self.assert_extension_array_equal(expected, arr)

pandas/tests/extension/base/setitem.py:205: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/_testing.py:1057: in assert_extension_array_equal
    obj="ExtensionArray",
pandas/_libs/testing.pyx:65: in pandas._libs.testing.assert_almost_equal
    cpdef assert_almost_equal(a, b,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   raise_assert_detail(obj, msg, lobj, robj)
E   AssertionError: ExtensionArray are different
E   
E   ExtensionArray values are different (60.0 %)
E   [left]:  [(0,), (0,), (0,), (3,), (4,)]
E   [right]: [0, 0, 0, (3,), (4,)]

@@ -70,6 +70,7 @@ Indexing

-
-
- Bug where assigning to a ``Series`` using a IntegerArray / BooleanArray as a mask would raise ``TypeError`` (:issue:`31446`)
Copy link
Member

@MarcoGorelli MarcoGorelli Feb 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:class:`Series`

instead of

``Series``

so that there's a link in the docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, tbh i wasn't quite sure about the rule here, since i see three different representations used in whatsnew

:class:`Series`, ``Series``, Series

but seems having :class:Series is more consistent, will change


def test_setitem_nullable_mask(self, data):
# GH 31446
# TODO: there is some issue with PandasArray, therefore,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you create an issue for this? (e.g. PandasArray support for setitem)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 is there an opened issue for this yet? I haven't found it and it causes CI of geopandas to fail as contrary to the comment this test is not skipped.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martinfleis The issue for geopandas is not due to PandasArray not working, but rather because we need to update our __setitem__ implementation (similar to how I updated the __getitem__), will do a PR for that.

@jreback jreback merged commit 22f0d84 into pandas-dev:master Feb 1, 2020
@jreback
Copy link
Contributor

jreback commented Feb 1, 2020

thanks @charlesdong1991

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REGR: Array.__setitem__ failing with nullable boolean mask
7 participants