Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: pd.to_numeric does not copy _mask for ExtensionArrays #38974

Closed
arw2019 opened this issue Jan 5, 2021 · 0 comments · Fixed by #39049
Closed

BUG: pd.to_numeric does not copy _mask for ExtensionArrays #38974

arw2019 opened this issue Jan 5, 2021 · 0 comments · Fixed by #39049
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@arw2019
Copy link
Member

arw2019 commented Jan 5, 2021

In #38746 while implementing to_numeric for ExtensionArrays I do not copy the mask of the original input. This means that the (potentially) cast array shares a mask with the input. We should copy the mask.

In [9]: import pandas as pd
   ...: import pandas._testing as tm
   ...: 
   ...: arr = pd.array([1, 2, pd.NA], dtype="Int64")
   ...: 
   ...: result = pd.to_numeric(arr, downcast="integer")
   ...: expected = pd.array([1, 2, pd.NA], dtype="Int8")
   ...: tm.assert_extension_array_equal(result, expected)
   ...: 
   ...: arr[1] = pd.NA # should not modify result
   ...: tm.assert_extension_array_equal(result, expected)
   ...: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-9-f72c43e18273> in <module>
      9 
     10 arr[1] = pd.NA
---> 11 tm.assert_extension_array_equal(result, expected)

~/repos/pandas/pandas/_testing/asserters.py in assert_extension_array_equal(left, right, check_dtype, index_values, check_less_precise, check_exact, rtol, atol)
    794     left_na = np.asarray(left.isna())
    795     right_na = np.asarray(right.isna())
--> 796     assert_numpy_array_equal(
    797         left_na, right_na, obj="ExtensionArray NA mask", index_values=index_values
    798     )

    [... skipping hidden 1 frame]

~/repos/pandas/pandas/_testing/asserters.py in _raise(left, right, err_msg)
    699             diff = diff * 100.0 / left.size
    700             msg = f"{obj} values are different ({np.round(diff, 5)} %)"
--> 701             raise_assert_detail(obj, msg, left, right, index_values=index_values)
    702 
    703         raise AssertionError(err_msg)

~/repos/pandas/pandas/_testing/asserters.py in raise_assert_detail(obj, message, left, right, diff, index_values)
    629         msg += f"\n[diff]: {diff}"
    630 
--> 631     raise AssertionError(msg)
    632 
    633 

AssertionError: ExtensionArray NA mask are different

ExtensionArray NA mask values are different (33.33333 %)
[left]:  [False, True, True]
[right]: [False, False, True]

Thanks @jorisvandenbossche for pointing this out!

@arw2019 arw2019 added Bug Needs Triage Issue that has not been reviewed by a pandas team member ExtensionArray Extending pandas with custom dtypes or arrays. NA - MaskedArrays Related to pd.NA and nullable extension arrays Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member ExtensionArray Extending pandas with custom dtypes or arrays. labels Jan 5, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants