Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN/DEPS: Clean up post numpy bump to 1.12 #23796

Merged
merged 21 commits into from
Nov 27, 2018

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Nov 19, 2018

I stumbled over a place I had forgotten to change in #23062 - README.md. And since I recently found some grep-like powers on windows with findstr, I took this opportunity to have a look through the code base for other things that got missed in that PR.

Turns out there was quite a bit of crusty old code left from before the last bump to numpy 1.9 (the oldest I found was some commented out code due to segfaults for numpy < 1.5.1).

Also, since #23062 bumped matplotlib to 2.0.0, I looked for old code there as well.

There were a few places where it was not 100% clear what the right way to remove them was - I'm relying on a passing test suite to be a good enough yardstick. In any case, this should have a thorough review to make sure I didn't eff something up.

I'll add a more detailed review right away to explain my reasoning behind each one of the changes.

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more details on the how/why and open questions.

except Exception:
# upcast to object
dtype = np.object_
fill_value = np.nan
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows the comments in the compat code above. Not sure if this is correct, but the tests pass.

@@ -408,6 +408,7 @@ def array_equivalent(left, right, strict_nan=False):

# Object arrays can contain None, NaN and NaT.
# string dtypes must be come to this path for NumPy 1.7.1 compat
# TODO: remove old numpy compat code (or comment)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'm not sure what's to be done here (expand below; no numpy code to be found)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that "for NumPy 1.7.1 compat" should be left standing, but ok.

@@ -3145,17 +3138,6 @@ def _merge_blocks(blocks, dtype=None, _can_consolidate=True):
return blocks


def _vstack(to_stack, dtype):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was only used once, and doesn't need to be defined at all once the shim isn't necessary anymore

@@ -1178,8 +1163,7 @@ def insert(self, loc, item, value, allow_duplicates=False):
blk.mgr_locs = new_mgr_locs

if loc == self._blklocs.shape[0]:
# np.append is a lot faster (at least in numpy 1.7.1), let's use it
# if we can.
# np.append is a lot faster, let's use it if we can.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No code changed here, just the comment

@@ -251,7 +251,6 @@ def dtype_for(t):
'complex128': np.float64,
'complex64': np.float32}

# numpy 1.6.1 compat
if hasattr(np, 'float128'):
c2f_dict['complex256'] = np.float128
Copy link
Contributor Author

@h-vetinari h-vetinari Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just numpy compat (hence the removed comment) - including 'complex256': np.float128 into c2f_dict does not work on my windows machine, for example

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment to that effect? "windows compat"or "windows 32 bit compat" or "[whatever would be accurate] compat"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -37,7 +37,6 @@ def test_big_print(self):
def test_empty_print(self):
factor = Categorical([], ["a", "b", "c"])
expected = ("[], Categories (3, object): [a, b, c]")
# hack because array_repr changed in numpy > 1.6.x
actual = repr(factor)
assert actual == expected
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough to say that the repr of np.array has arrived, no? no need to call getting a list from it a hack, IMO

@@ -94,7 +94,6 @@ class TestChaining(object):
def test_setitem_chained_setfault(self):

# GH6026
# setfaults under numpy 1.7.1 (ok on 1.8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not relevant any more, IMO (issue reference is still there)

@@ -337,7 +337,7 @@ def test_iloc_setitem_list(self):
tm.assert_frame_equal(df, expected)

def test_iloc_setitem_pandas_object(self):
# GH 17193, affecting old numpy (1.7 and 1.8)
# GH 17193
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@@ -199,8 +199,6 @@ def roundtrip(key, obj, **kwargs):
def test_long_strings(self):

# GH6166
# unconversion of long strings was being chopped in earlier
# versions of numpy < 1.7.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

@@ -292,11 +292,10 @@ def test_none_comparison(self):
assert not result.iat[0]
assert not result.iat[1]

# this fails for numpy < 1.9
# and oddly for *some* platforms
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me locally, let's see what the CI says...

if values.dtype != _NS_DTYPE:
# Workaround for numpy 1.6 bug
values = conversion.ensure_datetime64ns(values)
values = conversion.ensure_datetime64ns(values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly pass copy=False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

return np.c_[nz, counts[nz]]
counts = np.bincount(arr.astype(np.int_))
nz = counts.nonzero()[0]
return np.c_[nz, counts[nz]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used often? worth keeping as a standalone function vs inlining?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got no strong opinion on that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its called twice as an operand for a loop, so ok to keep

@@ -850,12 +850,9 @@ def test_equals_op(self):
tm.assert_series_equal(series_a == item, Series(expected3))

def test_numpy_ufuncs(self):
# test ufuncs of numpy 1.9.2. see:
# test ufuncs of numpy, see:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the version number be removed or updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ufuncs are here to stay now, no? That's why I removed it.

Unless you mean that the set of ufuncs is related to the version number?

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of note: I don't know many of the touched code paths, so I'm just trying to make an educated guess what should be removed. Didn't try to go 2 steps beyond and worry about further simplifications that might now be possible

if values.dtype != _NS_DTYPE:
# Workaround for numpy 1.6 bug
values = conversion.ensure_datetime64ns(values)
values = conversion.ensure_datetime64ns(values)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.

return np.c_[nz, counts[nz]]
counts = np.bincount(arr.astype(np.int_))
nz = counts.nonzero()[0]
return np.c_[nz, counts[nz]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got no strong opinion on that.

@@ -485,8 +485,7 @@ def astype_intsafe(ndarray[object] arr, new_dtype):
bint is_datelike
ndarray result

# on 32-bit, 1.6.2 numpy M8[ns] is a subdtype of integer, which is weird
is_datelike = new_dtype in ['M8[ns]', 'm8[ns]']
is_datelike = new_dtype == 'm8[ns]'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core.dtypes.cast is the only place where this gets called, and it has a comment # work around for NumPy brokenness, #1987, might have more things worth cleaning up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and the call in dtypes.cast. pandas/tests/dtypes/test_cast.py still passes, so let's try how this goes here.

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this out

@@ -485,8 +485,7 @@ def astype_intsafe(ndarray[object] arr, new_dtype):
bint is_datelike
ndarray result

# on 32-bit, 1.6.2 numpy M8[ns] is a subdtype of integer, which is weird
is_datelike = new_dtype in ['M8[ns]', 'm8[ns]']
is_datelike = new_dtype == 'm8[ns]'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this and the call in dtypes.cast. pandas/tests/dtypes/test_cast.py still passes, so let's try how this goes here.

@@ -251,7 +251,6 @@ def dtype_for(t):
'complex128': np.float64,
'complex64': np.float32}

# numpy 1.6.1 compat
if hasattr(np, 'float128'):
c2f_dict['complex256'] = np.float128
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -850,12 +850,9 @@ def test_equals_op(self):
tm.assert_series_equal(series_a == item, Series(expected3))

def test_numpy_ufuncs(self):
# test ufuncs of numpy 1.9.2. see:
# test ufuncs of numpy, see:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ufuncs are here to stay now, no? That's why I removed it.

Unless you mean that the set of ufuncs is related to the version number?

Copy link
Contributor Author

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted a change that was causing a segfault. Solution TBD

@@ -267,6 +267,8 @@ def maybe_promote(dtype, fill_value=np.nan):
# for now: refuse to upcast datetime64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replacing this compat code with

    if issubclass(dtype.type, (np.datetime64, np.timedelta64)):
        try:
            fill_value = tslibs.Timedelta(fill_value).value
        except Exception:
            # upcast to object
            dtype = np.object_
            fill_value = np.nan

leads to a segfault in tests/test_take.py::TestTake::test_2d_datetime64. have not investigated further

@jreback jreback added Clean Dependencies Required and optional dependencies labels Nov 20, 2018
@jreback
Copy link
Contributor

jreback commented Nov 20, 2018

looks fine. needs a rebase and everything passing. ping when green. if something is not working after trying, can just mark it and come back with a followup.

if np.issubdtype(dtype.type, np.integer):
# TODO: this is an old numpy compat branch that is not necessary
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel Could you have a look at this. Couldn't fix it easily, not sure how relevant that test is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Not sure off the top whats going on here.

Why the comment saying it is no longer necessary? np.timedelta64 is still a subdtype of np.integer

To the extent that this branch is just for timedelta64 dtypes, I would make that explicit instead of checking if np.issubdtype(dtype.type, np.integer)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment (and issue) was about unsafe casting from object to int - that's not the case for timedelta types, and even more so, there's a dedicated branch for such dtypes below that is never hit (and actually leads to a test failure in tests/reshape/merge/test_merge.py::TestMerge:;test_other_timedelta_unit). That's why I wanted to ask if you have an idea about it. :)

@@ -723,8 +725,19 @@ def astype_nansafe(arr, dtype, copy=True, skipna=False):

elif is_object_dtype(arr):

# work around NumPy brokenness, #1987
if np.issubdtype(dtype.type, np.integer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so change this to a test on is_timedelta_dtype then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverting back to original now contradicts this other review comment of yours. But I agree this whole thing is something for a follow-up

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vetinari as the code evolves, I re-evaluate if this is a good change or not. This got messy real fast and is not worth doing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
That's fair enough, but - by that time - I had already reverted the code changes and only added an explanatory comment. I'd say this would be value added for anyone stumbling across these lines, but have removed this now as well.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment

if np.issubdtype(dtype.type, np.integer):
# TODO: this is an old numpy compat branch that is not necessary
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original comment (and issue) was about unsafe casting from object to int - that's not the case for timedelta types, and even more so, there's a dedicated branch for such dtypes below that is never hit (and actually leads to a test failure in tests/reshape/merge/test_merge.py::TestMerge:;test_other_timedelta_unit). That's why I wanted to ask if you have an idea about it. :)

# rec.dtype.names = list(rec.dtype.names)[::-1]
if PY3:
# unicode error under PY2
rec.dtype.names = list(rec.dtype.names)[::-1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this ancient code path (that has been commented out since forever) is worth keeping. It's failing on PY2, passing on PY3

# in mpl 1.5+ this is a TypeError
with pytest.raises((ValueError, TypeError)):
# MPL > 2.0.0 will most likely use TypeError here
with pytest.raises((TypeError, ValueError)):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was still raising ValueErrors in the 2.0.0 builds

@codecov
Copy link

codecov bot commented Nov 22, 2018

Codecov Report

Merging #23796 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23796      +/-   ##
==========================================
+ Coverage   92.29%   92.31%   +0.01%     
==========================================
  Files         161      161              
  Lines       51498    51464      -34     
==========================================
- Hits        47530    47507      -23     
+ Misses       3968     3957      -11
Flag Coverage Δ
#multiple 90.7% <100%> (+0.01%) ⬆️
#single 42.42% <90.9%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/io/packers.py 88.08% <ø> (ø) ⬆️
pandas/core/arrays/categorical.py 95.34% <ø> (-0.02%) ⬇️
pandas/plotting/_misc.py 38.68% <ø> (-0.31%) ⬇️
pandas/core/arrays/datetimelike.py 95.96% <ø> (ø) ⬆️
pandas/core/arrays/sparse.py 91.93% <ø> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 96.48% <ø> (+0.28%) ⬆️
pandas/core/dtypes/dtypes.py 95.59% <100%> (-0.03%) ⬇️
pandas/core/reshape/tile.py 94.73% <100%> (ø) ⬆️
pandas/core/dtypes/cast.py 89.62% <100%> (+0.63%) ⬆️
pandas/core/internals/blocks.py 93.71% <100%> (+0.03%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a7b187a...ce5bd94. Read the comment docs.

@pep8speaks
Copy link

Hello @h-vetinari! Thanks for updating the PR.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except for all of the changes in the integer casting this is fine. remove those and can merge.

@@ -264,27 +264,22 @@ def maybe_promote(dtype, fill_value=np.nan):

# returns tuple of (dtype, fill_value)
if issubclass(dtype.type, (np.datetime64, np.timedelta64)):
# for now: refuse to upcast datetime64
# (this is because datetime64 will not implicitly upconvert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than do it this way, just make the top level 2 checks (1 for timedelta, 1 for datetime) then you don't need to check twice.

try:
fill_value = tslibs.Timedelta(fill_value).value
except Exception:
dtype = np.object_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these exceptions actually hit in any tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that maybe_promote is not tested directly (see also #23833), I'm working on a PR for that currently.

Also, I've had a segfault in pandas/tests/test_take.py::TestTake::test_2d_datetime64 on an earlier attempt, but let's see.

# anymore for its original purpose (unsafe casting from object to
# int, see GH 1987).
# Currently, timedelta dtypes get routed through here; whereas
# uncommenting them would re-call (see below)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments are way too long. what exactly are you trying to do here? let's leave this entire casting as is for now w/o any changes. this needs a separate followup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to leave some breadcrumbs as I spent 2-3 hours trying to investigate this.

I've shortened the comment now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather you not and just revert back to what it was. if you want to make an issue ok.

@@ -408,6 +408,7 @@ def array_equivalent(left, right, strict_nan=False):

# Object arrays can contain None, NaN and NaT.
# string dtypes must be come to this path for NumPy 1.7.1 compat
# TODO: remove old numpy compat code (or comment)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this comment

@jreback jreback merged commit 6af9c30 into pandas-dev:master Nov 27, 2018
@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

thanks @h-vetinari nice cleanup

@h-vetinari h-vetinari deleted the post_numpy_bump branch November 27, 2018 06:43
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Dependencies Required and optional dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants