Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: is_copy arg of take #30615

Merged
merged 13 commits into from
Jan 6, 2020

Conversation

ryankarlos
Copy link
Contributor

@ryankarlos ryankarlos commented Jan 2, 2020

@ryankarlos ryankarlos changed the title deprecate is_copy argument of take and remove is_copy=False usage DEPR: dis_copy argument of take Jan 2, 2020
@ryankarlos ryankarlos changed the title DEPR: dis_copy argument of take DEPR: is_copy arg of take Jan 2, 2020
@ryankarlos
Copy link
Contributor Author

Not sure why my changes are making these other tests break ?
Failed: DID NOT RAISE <class 'pandas.core.common.SettingWithCopyError'>

@jreback jreback added the Deprecate Functionality to remove in pandas label Jan 3, 2020
@jreback
Copy link
Contributor

jreback commented Jan 3, 2020

looks good, needs a rebase; ping when green.

@pep8speaks
Copy link

pep8speaks commented Jan 3, 2020

Hello @ryankarlos! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-06 02:08:13 UTC

@ryankarlos ryankarlos force-pushed the DEPR/is_copy_arg_of_take branch 2 times, most recently from c9d5f26 to 4ed4028 Compare January 3, 2020 01:44
@jreback jreback added this to the 1.0 milestone Jan 3, 2020
@ryankarlos
Copy link
Contributor Author

ryankarlos commented Jan 3, 2020

@jreback do you know why the depr warning is making the test fail specifically for Linuxpy37 ?

pandas/tests/frame/methods/test_asof.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/generic.py:7009: in asof
    data = self.take(locs, is_copy=False)

and if i set this to data = self.take(locs) , it raises a SettingWithCopyError for all the builds due to this write on copy data.loc[missing] = np.nan

        if value == "raise":
>           raise com.SettingWithCopyError(t)
E           pandas.core.common.SettingWithCopyError: 
E           A value is trying to be set on a copy of a slice from a DataFrame.
E           Try using .loc[row_indexer,col_indexer] = value instead
E           
E           See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

pandas/core/generic.py:3684: SettingWithCopyError

@TomAugspurger
Copy link
Contributor

The numpydev py37 build is the only one where we elevate warnings to errors which fail the build.

Most likely you'll need to look at that test and what is_copy is doing. If the test looks correct, you made need to replicate some of the is_copy=False behavior in the places that previously passed is_copy=False.

@ryankarlos
Copy link
Contributor Author

ryankarlos commented Jan 4, 2020

thanks. @TomAugspurger @jreback found a workaround - all green now - let me know if any more changes needed.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated routines, df.asof should not be showing a warning

with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)

result = df.asof(dates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this showing a warning? you need to fix internally so this is not the case

Copy link
Contributor Author

@ryankarlos ryankarlos Jan 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the self.take(locs, is_copy=False) in def asof will now generate a warning after the deprecation change.
I could change it to default None (which sets is_copy=True) - but then that generates a SettingWithCopyError due to data.loc[missing] = np.nan in the block below

missing = locs == -1
data = self.take(locs)
data.index = where
data.loc[missing] = np.nan

I suppose I could temporarily suppress it

with pd.option_context('mode.chained_assignment', None):
            data.loc[missing] = np.nan

or create a deep copy after the take operation but not sure if thats the best approach ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you simply step theu the code and out a copy where needed

expected = df.asof(dates)
tm.assert_frame_equal(result, expected)
with warnings.catch_warnings():
warnings.simplefilter("ignore", FutureWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback jreback merged commit 7796be6 into pandas-dev:master Jan 6, 2020
@jreback
Copy link
Contributor

jreback commented Jan 6, 2020

thanks @ryankarlos

@jorisvandenbossche
Copy link
Member

This might have caused a regression in asof: https://pandas.pydata.org/speed/pandas/#timeseries.AsOf.time_asof?p-constructor='DataFrame'&commits=2a3d840d-c99dfea3 (note there are a bunch of other commits in the time span of the regression according to asv, but this is the most obvious one that touched asof)

@@ -7011,7 +7023,8 @@ def asof(self, where, subset=None):

# mask the missing
missing = locs == -1
data = self.take(locs, is_copy=False)
d = self.take(locs)
data = d.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were there test failures if you didn't do this d.copy() step?

Copy link
Contributor Author

@ryankarlos ryankarlos Jan 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was breaking builds with SettingwithCopyError see #30615 (comment) and lower bit of #30615 (comment)

@jorisvandenbossche
Copy link
Member

Yes, it was breaking builds with SettingwithCopyError #30615 (comment)

I think this points to a problem in the current deprecation.

IMO, the goal of removing is_copy is to make this keyword unnecessary, in the idea that take should always return a copy. So that means you should never need to manually copy after doing a take.

WIth the current deprecation however, the user seems to loose this ability of not needing to take a copy. So I think, when deprecating/removing the is_copy arg, it should rather be the behaviour of is_copy=False that we should keep as the default.

@jreback
Copy link
Contributor

jreback commented Jan 8, 2020

the purpose of is_copy was to force a copy
take never had the always copy behavior of numpy - but it was mostly an internal method anyhow (eg you can use iloc)

so prob ok just to leave as is (maybe add a doc string)

@ryankarlos ryankarlos deleted the DEPR/is_copy_arg_of_take branch January 8, 2020 03:24
@jorisvandenbossche
Copy link
Member

the purpose of is_copy was to force a copy

I don't think this is correct, as is_copy=True/False did not influence whether a copy was made or not. What is_copy=True did, is setting self._is_copy to indicate to pandas it is referencing another object. So actually is_copy=True made it "less" a copy than is_copy=False.

With is_copy=False, you could indicate to pandas "this is really a copy already, don't give me annoying SettingWithCopyWarnings" without the need to do another explicit copy.

(the name of is_copy and the explanation in the docstring is just very confusing / wrong)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DEPR: .take(..., is_copy=True); rename is_copy -> copy
6 participants