-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN/DEPS: Clean up post numpy bump to 1.12 #23796
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more details on the how/why and open questions.
pandas/core/dtypes/cast.py
Outdated
except Exception: | ||
# upcast to object | ||
dtype = np.object_ | ||
fill_value = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This follows the comments in the compat code above. Not sure if this is correct, but the tests pass.
pandas/core/dtypes/missing.py
Outdated
@@ -408,6 +408,7 @@ def array_equivalent(left, right, strict_nan=False): | |||
|
|||
# Object arrays can contain None, NaN and NaT. | |||
# string dtypes must be come to this path for NumPy 1.7.1 compat | |||
# TODO: remove old numpy compat code (or comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I'm not sure what's to be done here (expand below; no numpy code to be found)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that "for NumPy 1.7.1 compat" should be left standing, but ok.
@@ -3145,17 +3138,6 @@ def _merge_blocks(blocks, dtype=None, _can_consolidate=True): | |||
return blocks | |||
|
|||
|
|||
def _vstack(to_stack, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was only used once, and doesn't need to be defined at all once the shim isn't necessary anymore
@@ -1178,8 +1163,7 @@ def insert(self, loc, item, value, allow_duplicates=False): | |||
blk.mgr_locs = new_mgr_locs | |||
|
|||
if loc == self._blklocs.shape[0]: | |||
# np.append is a lot faster (at least in numpy 1.7.1), let's use it | |||
# if we can. | |||
# np.append is a lot faster, let's use it if we can. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No code changed here, just the comment
@@ -251,7 +251,6 @@ def dtype_for(t): | |||
'complex128': np.float64, | |||
'complex64': np.float32} | |||
|
|||
# numpy 1.6.1 compat | |||
if hasattr(np, 'float128'): | |||
c2f_dict['complex256'] = np.float128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just numpy compat (hence the removed comment) - including 'complex256': np.float128
into c2f_dict
does not work on my windows machine, for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment to that effect? "windows compat"or "windows 32 bit compat" or "[whatever would be accurate] compat"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -37,7 +37,6 @@ def test_big_print(self): | |||
def test_empty_print(self): | |||
factor = Categorical([], ["a", "b", "c"]) | |||
expected = ("[], Categories (3, object): [a, b, c]") | |||
# hack because array_repr changed in numpy > 1.6.x | |||
actual = repr(factor) | |||
assert actual == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough to say that the repr of np.array has arrived, no? no need to call getting a list from it a hack, IMO
@@ -94,7 +94,6 @@ class TestChaining(object): | |||
def test_setitem_chained_setfault(self): | |||
|
|||
# GH6026 | |||
# setfaults under numpy 1.7.1 (ok on 1.8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment not relevant any more, IMO (issue reference is still there)
@@ -337,7 +337,7 @@ def test_iloc_setitem_list(self): | |||
tm.assert_frame_equal(df, expected) | |||
|
|||
def test_iloc_setitem_pandas_object(self): | |||
# GH 17193, affecting old numpy (1.7 and 1.8) | |||
# GH 17193 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
@@ -199,8 +199,6 @@ def roundtrip(key, obj, **kwargs): | |||
def test_long_strings(self): | |||
|
|||
# GH6166 | |||
# unconversion of long strings was being chopped in earlier | |||
# versions of numpy < 1.7.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
@@ -292,11 +292,10 @@ def test_none_comparison(self): | |||
assert not result.iat[0] | |||
assert not result.iat[1] | |||
|
|||
# this fails for numpy < 1.9 | |||
# and oddly for *some* platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me locally, let's see what the CI says...
pandas/core/internals/blocks.py
Outdated
if values.dtype != _NS_DTYPE: | ||
# Workaround for numpy 1.6 bug | ||
values = conversion.ensure_datetime64ns(values) | ||
values = conversion.ensure_datetime64ns(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possibly pass copy=False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
return np.c_[nz, counts[nz]] | ||
counts = np.bincount(arr.astype(np.int_)) | ||
nz = counts.nonzero()[0] | ||
return np.c_[nz, counts[nz]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used often? worth keeping as a standalone function vs inlining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got no strong opinion on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its called twice as an operand for a loop, so ok to keep
@@ -850,12 +850,9 @@ def test_equals_op(self): | |||
tm.assert_series_equal(series_a == item, Series(expected3)) | |||
|
|||
def test_numpy_ufuncs(self): | |||
# test ufuncs of numpy 1.9.2. see: | |||
# test ufuncs of numpy, see: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the version number be removed or updated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ufuncs are here to stay now, no? That's why I removed it.
Unless you mean that the set of ufuncs is related to the version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of note: I don't know many of the touched code paths, so I'm just trying to make an educated guess what should be removed. Didn't try to go 2 steps beyond and worry about further simplifications that might now be possible
pandas/core/internals/blocks.py
Outdated
if values.dtype != _NS_DTYPE: | ||
# Workaround for numpy 1.6 bug | ||
values = conversion.ensure_datetime64ns(values) | ||
values = conversion.ensure_datetime64ns(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
return np.c_[nz, counts[nz]] | ||
counts = np.bincount(arr.astype(np.int_)) | ||
nz = counts.nonzero()[0] | ||
return np.c_[nz, counts[nz]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got no strong opinion on that.
pandas/_libs/lib.pyx
Outdated
@@ -485,8 +485,7 @@ def astype_intsafe(ndarray[object] arr, new_dtype): | |||
bint is_datelike | |||
ndarray result | |||
|
|||
# on 32-bit, 1.6.2 numpy M8[ns] is a subdtype of integer, which is weird | |||
is_datelike = new_dtype in ['M8[ns]', 'm8[ns]'] | |||
is_datelike = new_dtype == 'm8[ns]' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core.dtypes.cast is the only place where this gets called, and it has a comment # work around for NumPy brokenness, #1987
, might have more things worth cleaning up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this and the call in dtypes.cast
. pandas/tests/dtypes/test_cast.py
still passes, so let's try how this goes here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking this out
pandas/_libs/lib.pyx
Outdated
@@ -485,8 +485,7 @@ def astype_intsafe(ndarray[object] arr, new_dtype): | |||
bint is_datelike | |||
ndarray result | |||
|
|||
# on 32-bit, 1.6.2 numpy M8[ns] is a subdtype of integer, which is weird | |||
is_datelike = new_dtype in ['M8[ns]', 'm8[ns]'] | |||
is_datelike = new_dtype == 'm8[ns]' | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this and the call in dtypes.cast
. pandas/tests/dtypes/test_cast.py
still passes, so let's try how this goes here.
@@ -251,7 +251,6 @@ def dtype_for(t): | |||
'complex128': np.float64, | |||
'complex64': np.float32} | |||
|
|||
# numpy 1.6.1 compat | |||
if hasattr(np, 'float128'): | |||
c2f_dict['complex256'] = np.float128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -850,12 +850,9 @@ def test_equals_op(self): | |||
tm.assert_series_equal(series_a == item, Series(expected3)) | |||
|
|||
def test_numpy_ufuncs(self): | |||
# test ufuncs of numpy 1.9.2. see: | |||
# test ufuncs of numpy, see: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ufuncs are here to stay now, no? That's why I removed it.
Unless you mean that the set of ufuncs is related to the version number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted a change that was causing a segfault. Solution TBD
pandas/core/dtypes/cast.py
Outdated
@@ -267,6 +267,8 @@ def maybe_promote(dtype, fill_value=np.nan): | |||
# for now: refuse to upcast datetime64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replacing this compat code with
if issubclass(dtype.type, (np.datetime64, np.timedelta64)):
try:
fill_value = tslibs.Timedelta(fill_value).value
except Exception:
# upcast to object
dtype = np.object_
fill_value = np.nan
leads to a segfault in tests/test_take.py::TestTake::test_2d_datetime64
. have not investigated further
looks fine. needs a rebase and everything passing. ping when green. if something is not working after trying, can just mark it and come back with a followup. |
pandas/core/dtypes/cast.py
Outdated
if np.issubdtype(dtype.type, np.integer): | ||
# TODO: this is an old numpy compat branch that is not necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel Could you have a look at this. Couldn't fix it easily, not sure how relevant that test is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. Not sure off the top whats going on here.
Why the comment saying it is no longer necessary? np.timedelta64 is still a subdtype of np.integer
To the extent that this branch is just for timedelta64 dtypes, I would make that explicit instead of checking if np.issubdtype(dtype.type, np.integer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment (and issue) was about unsafe casting from object to int - that's not the case for timedelta types, and even more so, there's a dedicated branch for such dtypes below that is never hit (and actually leads to a test failure in tests/reshape/merge/test_merge.py::TestMerge:;test_other_timedelta_unit
). That's why I wanted to ask if you have an idea about it. :)
pandas/core/dtypes/cast.py
Outdated
@@ -723,8 +725,19 @@ def astype_nansafe(arr, dtype, copy=True, skipna=False): | |||
|
|||
elif is_object_dtype(arr): | |||
|
|||
# work around NumPy brokenness, #1987 | |||
if np.issubdtype(dtype.type, np.integer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so change this to a test on is_timedelta_dtype then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverting back to original now contradicts this other review comment of yours. But I agree this whole thing is something for a follow-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@h-vetinari as the code evolves, I re-evaluate if this is a good change or not. This got messy real fast and is not worth doing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
That's fair enough, but - by that time - I had already reverted the code changes and only added an explanatory comment. I'd say this would be value added for anyone stumbling across these lines, but have removed this now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment
pandas/core/dtypes/cast.py
Outdated
if np.issubdtype(dtype.type, np.integer): | ||
# TODO: this is an old numpy compat branch that is not necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment (and issue) was about unsafe casting from object to int - that's not the case for timedelta types, and even more so, there's a dedicated branch for such dtypes below that is never hit (and actually leads to a test failure in tests/reshape/merge/test_merge.py::TestMerge:;test_other_timedelta_unit
). That's why I wanted to ask if you have an idea about it. :)
# rec.dtype.names = list(rec.dtype.names)[::-1] | ||
if PY3: | ||
# unicode error under PY2 | ||
rec.dtype.names = list(rec.dtype.names)[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this ancient code path (that has been commented out since forever) is worth keeping. It's failing on PY2
, passing on PY3
# in mpl 1.5+ this is a TypeError | ||
with pytest.raises((ValueError, TypeError)): | ||
# MPL > 2.0.0 will most likely use TypeError here | ||
with pytest.raises((TypeError, ValueError)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was still raising ValueErrors in the 2.0.0 builds
Codecov Report
@@ Coverage Diff @@
## master #23796 +/- ##
==========================================
+ Coverage 92.29% 92.31% +0.01%
==========================================
Files 161 161
Lines 51498 51464 -34
==========================================
- Hits 47530 47507 -23
+ Misses 3968 3957 -11
Continue to review full report at Codecov.
|
Hello @h-vetinari! Thanks for updating the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except for all of the changes in the integer casting this is fine. remove those and can merge.
@@ -264,27 +264,22 @@ def maybe_promote(dtype, fill_value=np.nan): | |||
|
|||
# returns tuple of (dtype, fill_value) | |||
if issubclass(dtype.type, (np.datetime64, np.timedelta64)): | |||
# for now: refuse to upcast datetime64 | |||
# (this is because datetime64 will not implicitly upconvert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than do it this way, just make the top level 2 checks (1 for timedelta, 1 for datetime) then you don't need to check twice.
pandas/core/dtypes/cast.py
Outdated
try: | ||
fill_value = tslibs.Timedelta(fill_value).value | ||
except Exception: | ||
dtype = np.object_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these exceptions actually hit in any tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that maybe_promote
is not tested directly (see also #23833), I'm working on a PR for that currently.
Also, I've had a segfault in pandas/tests/test_take.py::TestTake::test_2d_datetime64
on an earlier attempt, but let's see.
pandas/core/dtypes/cast.py
Outdated
# anymore for its original purpose (unsafe casting from object to | ||
# int, see GH 1987). | ||
# Currently, timedelta dtypes get routed through here; whereas | ||
# uncommenting them would re-call (see below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments are way too long. what exactly are you trying to do here? let's leave this entire casting as is for now w/o any changes. this needs a separate followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to leave some breadcrumbs as I spent 2-3 hours trying to investigate this.
I've shortened the comment now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would rather you not and just revert back to what it was. if you want to make an issue ok.
pandas/core/dtypes/missing.py
Outdated
@@ -408,6 +408,7 @@ def array_equivalent(left, right, strict_nan=False): | |||
|
|||
# Object arrays can contain None, NaN and NaT. | |||
# string dtypes must be come to this path for NumPy 1.7.1 compat | |||
# TODO: remove old numpy compat code (or comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this comment
thanks @h-vetinari nice cleanup |
git diff upstream/master -u -- "*.py" | flake8 --diff
I stumbled over a place I had forgotten to change in #23062 -
README.md
. And since I recently found somegrep
-like powers on windows withfindstr
, I took this opportunity to have a look through the code base for other things that got missed in that PR.Turns out there was quite a bit of crusty old code left from before the last bump to
numpy 1.9
(the oldest I found was some commented out code due to segfaults fornumpy < 1.5.1
).Also, since #23062 bumped matplotlib to
2.0.0
, I looked for old code there as well.There were a few places where it was not 100% clear what the right way to remove them was - I'm relying on a passing test suite to be a good enough yardstick. In any case, this should have a thorough review to make sure I didn't eff something up.
I'll add a more detailed review right away to explain my reasoning behind each one of the changes.