Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/API: DataFrame.iloc[:, foo] = bar inplaceness? #44353

Closed
jbrockmendel opened this issue Nov 8, 2021 · 16 comments · Fixed by #45333
Closed

BUG/API: DataFrame.iloc[:, foo] = bar inplaceness? #44353

jbrockmendel opened this issue Nov 8, 2021 · 16 comments · Fixed by #45333
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Nov 8, 2021

In 1.3.0 we made it so that df["A"] = foo never operates in-place (xref whatsnew, #33457) and mostly made it so that df.loc[foo, bar] = baz tries to operate in-place before falling back to casting (xref whatsnew, #33457 (comment))

There are some remaining cases where .iloc and .loc do not try to set inplace. This issue is to make sure we have consensus about changing/deprecating that behavior. Most of the impacted test cases involve doing an astype, e.g.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(4).reshape(2, 2), dtype=object)
df.iloc[:, 0] = df.iloc[:, 0].astype(np.float64)

ATM this changes df[0] to float64, i.e. is not inplace.

In this case, the user could/should do df[0] = df[0].astype(np.float64). But that approach runs into problems with either non-unique columns or setting a slice of columns df.iloc[:, :2] = foo

My thoughts are that 1) we should be consistent about inplace-ness, not special-case the API and 2) the use case here is convenient.

Thoughts?

update Cataloguing issues/PRs where this behavior we implemented apparently-intentionally
#29393 (#25495)
#6149
#6159 BUG/API: df['col'] = value and df.loc[:,'col'] = value are now completely equivalent

update 2 Cataloguing issues that appear to be caused by/related to this inconsistency
#20635 BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype
#24269 DataFrame.loc/iloc fails to update object with new dtype

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 8, 2021
@jbrockmendel
Copy link
Member Author

cc @jreback @jorisvandenbossche @TomAugspurger discussed briefly on today's call could use your inputs

@mroeschke mroeschke added Index Related to the Index class or subclasses Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member Index Related to the Index class or subclasses labels Nov 14, 2021
@jbrockmendel
Copy link
Member Author

gentle ping @jreback a lot of indexing work hinges on whether we want to change this

@jreback
Copy link
Contributor

jreback commented Nov 14, 2021

API: df['col'] = value and df.loc[:,'col'] = value are now completely equivalent

i think these should be completely equivalent always
i always thought we did this

@jbrockmendel
Copy link
Member Author

i always thought we did this

Two problems here:

  1. We don't have a way for users to do this setting that is inplace.
  2. it isn't actually completely equivalent. If value has the same dtype as the existing column, it'll be inplace.

@jbrockmendel
Copy link
Member Author

xref #15631

@jbrockmendel
Copy link
Member Author

Marking as blocker for 1.4rc; we should make a decision about deprecation before then.

@jbrockmendel jbrockmendel added the Blocker for rc Blocking issue or pull request for release candidate label Dec 22, 2021
@jorisvandenbossche
Copy link
Member

Personally I think I would like the consistency of .iloc[row_indexer, column_indexer] always being inplace. But if we do that, we should provide an alternative to achieve the same: setting a column using a positional index (with new values, not in place, like df[label] = .. but using an integer index instead of a label).
Without the current iloc behaviour, this is not (easily) possible to do I think with our current APIs?

Also, if we change this, I think it should go with some deprecation cycle, since this was explicitly designed this way before (it's not that we are fixing a bug).

@jbrockmendel
Copy link
Member Author

Also, if we change this, I think it should go with some deprecation cycle, since this was explicitly designed this way before (it's not that we are fixing a bug).

Yah, that's why i'm pushing to get this done for 1.4 so we can enforce in 2.0.

should provide an alternative to achieve the same: setting a column using a positional index (with new values, not in place, like df[label] = .. but using an integer index instead of a label).

Agreed. Do you have any preferences on what that API would look like? I think something like df.iloc(axis=1)[foo] = bar might work in theory (though i doubt we have good test coverage for that)

@jbrockmendel
Copy link
Member Author

Turns out the df.iloc(axis=1)[foo]=bar doesn't work ATM bc of #45032

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

Personally I think I would like the consistency of .iloc[row_indexer, column_indexer] always being inplace. But if we do that, we should provide an alternative to achieve the same: setting a column using a positional index (with new values, not in place, like df[label] = .. but using an integer index instead of a label). Without the current iloc behaviour, this is not (easily) possible to do I think with our current APIs?

Also, if we change this, I think it should go with some deprecation cycle, since this was explicitly designed this way before (it's not that we are fixing a bug).

i don't think this is particular useful for integer indices, so -1 on adding yet another method to do the same thing.

@jbrockmendel
Copy link
Member Author

i don't think this is particular useful for integer indices, so -1 on adding yet another method to do the same thing.

The trouble is that if you have non-unique columns then there is no public way of doing this

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

i don't think this is particular useful for integer indices, so -1 on adding yet another method to do the same thing.

The trouble is that if you have non-unique columns then there is no public way of doing this

i get it.

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

I mean another option here is to go back to doing real masking instead of setitem with .loc[:, columns] (and .iloc)

but i think that creates more issues

@jbrockmendel
Copy link
Member Author

I mean another option here is to go back to doing real masking instead of setitem with .loc[:, columns] (and .iloc)

I'm not clear on what you're suggesting. In the status quo df.loc[:, foo] = bar never tries to operate in-place, is identical to df[foo] = bar. The proposal is that we should try to operate in-place, just like we would for df.loc[:-1, foo] = bar[:-1].

Under the status quo, there is no way to do a try-to-operate-in-place setting of an entire column. Under the proposed change, there is no way to do a dont-try-to-operate-in-place positional setting of an entire column.

@jreback
Copy link
Contributor

jreback commented Dec 23, 2021

I'm not clear on what you're suggesting. In the status quo df.loc[:, foo] = bar never tries to operate in-place, is identical to df[foo] = bar. The proposal is that we should try to operate in-place, just like we would for df.loc[:-1, foo] = bar[:-1].

we used to do exactly this and i changed it a long time ago because it was very confusing

@jbrockmendel
Copy link
Member Author

we used to do exactly this and i changed it a long time ago because it was very confusing

how is ".iloc[foo, bar] = baz always tries inplace" more confusing than "that, except for .iloc[:, bar] = baz which is almost never inplace, the exception being when baz has the same dtype as the existing column (but uh, not for ArrayManager or (some) ExtensionDtypes!)"

@jbrockmendel jbrockmendel removed the Blocker for rc Blocking issue or pull request for release candidate label Dec 30, 2021
@jbrockmendel jbrockmendel mentioned this issue Apr 15, 2022
1 task
@jreback jreback added this to the 1.5 milestone May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants