Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Uses pd.NA in IntegerArray #29964

Merged
merged 57 commits into from
Dec 30, 2019

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Dec 2, 2019

Just a WIP for now. I haven't closely reviewed this since pd.NA was merged into master.

cc @jorisvandenbossche

Closes #15375

@jreback jreback added API Design Dtype Conversions Unexpected or buggy dtype conversions labels Dec 2, 2019
@jorisvandenbossche jorisvandenbossche added this to the 1.0 milestone Dec 2, 2019
@TomAugspurger TomAugspurger marked this pull request as ready for review December 3, 2019 15:15
@@ -377,14 +377,28 @@ def __getitem__(self, item):
return self._data[item]
return type(self)(self._data[item], self._mask[item])

def _coerce_to_ndarray(self):
def _coerce_to_ndarray(self, dtype=None, na_value=libmissing.NA):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll want to make a to_array that's basically this method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have to_numeric which is the canonical form of to_array (rather do conversions there)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeff, the discussion about to_numpy (the to_array was a typo I think) moved to #30038 in the mean time. Can you move your comment there if relevant?

Note that to_numeric is a function that converts any thing to a numeric type. While this function here is to convert a numeric type (this IntegerArray) to any other numpy dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to_array should have been to_numpy.

pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/core/arrays/integer.py Show resolved Hide resolved
pandas/tests/arrays/test_integer.py Outdated Show resolved Hide resolved
pandas/tests/extension/test_integer.py Outdated Show resolved Hide resolved
pandas/tests/extension/test_integer.py Outdated Show resolved Hide resolved
pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/core/arrays/integer.py Outdated Show resolved Hide resolved
pandas/core/arrays/integer.py Show resolved Hide resolved
pandas/tests/arrays/test_integer.py Show resolved Hide resolved
pandas/tests/arrays/test_integer.py Outdated Show resolved Hide resolved
pandas/tests/extension/test_integer.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

Still some failing tests. Will have to put this on hold for a day or two now.

@TomAugspurger
Copy link
Contributor Author

pandas/tests/test_register_accessor.py:39: error: Module has no attribute "register_series_accessor"
pandas/tests/test_register_accessor.py:40: error: Module has no attribute "register_dataframe_accessor"
pandas/tests/test_register_accessor.py:41: error: Module has no attribute "register_index_accessor"

I'm not able to reproduce these locally.

@TomAugspurger TomAugspurger added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 17, 2019
@jorisvandenbossche
Copy link
Member

What's the status of this? This is ready to go? (with the special cases for pow etc already merged in other PRs)

@@ -643,25 +665,30 @@ def cmp_method(self, other):
if len(self) != len(other):
raise ValueError("Lengths must match to compare")

# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous question about this. Is this comment no longer relevant or correct? Or why was it removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, do you know how this is actually hit? If NumPy is going to raise in the future, shouldn't they be seeing that warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the warning you get with comparisons with objects / non-broadcastable arrays. Eg:

In [29]: np.array([1, 2]) == "b"   
/home/joris/miniconda3/envs/dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[29]: False

In [30]: pd.array([1, 2]) == "b" 
Out[30]: array([False, False])

(it seems IntegerArray already handles this fine, not sure there is a explicit test for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems IntegerArray already handles this fine,

Gotch. It's silencing the same warning from NumPy, and falling back to invalid_comparison, which returns the expected result. I'll restore the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... the comment is incorrect. NumPy will perform elementwise comparison in the future, not raise. If they were to raise on that in the future the implementation would be incorrect.

Though I'm still a bit confused, as the NumPy op is returning NotImplemented since we're calling it directly. Will that continue to return NotImplemented? Or will the elementwise result be different?

pandas/tests/arrays/test_integer.py Outdated Show resolved Hide resolved
@TomAugspurger
Copy link
Contributor Author

What's the status of this? This is ready to go? (with the special cases for pow etc already merged in other PRs)

I think it's ready. All the __pow__ stuff was handled elsewhere. The only pow-related changes were to add tests with pd.NA like https://github.com/pandas-dev/pandas/pull/29964/files#diff-3523d268da9f1c7d4c82d4d27241ed07R373.

@jorisvandenbossche
Copy link
Member

I am planning to merge this in a bit. I am not sure if @jreback's review was a full one, but the comments have been addressed/answered. But it would be good to have this in master for IntegerArray as well, and this is blocking some follow-ups (completing to_numpy in #30322, creating the base class). There can be more review in those PRs I think.

@jreback
Copy link
Contributor

jreback commented Dec 21, 2019

let me have a look a n hour or 2

@TomAugspurger
Copy link
Contributor Author

@jreback did you still want to / have time to take a look?

@jreback
Copy link
Contributor

jreback commented Dec 30, 2019

let me look

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. i think need a big whatsnew sub-section that explains show the implications of this change.

  1. getitem is now NA
  2. comparisions now return BooleanArray

@@ -39,6 +41,12 @@ NumPy's ``'int64'`` dtype:

pd.array([1, 2, np.nan], dtype="Int64")

All NA-like values are replaced with :attr:`pandas.NA`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may want to add a versionchanged tag here (and below)

@TomAugspurger
Copy link
Contributor Author

Thanks, added whatsnew for those changes.

@TomAugspurger
Copy link
Contributor Author

All green. Anything else @jreback or were the two in #29964 (review) it?

@jreback jreback merged commit 844dc4a into pandas-dev:master Dec 30, 2019
@jreback
Copy link
Contributor

jreback commented Dec 30, 2019

thanks @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Changing Series and DataFrame repr for NaN values
3 participants