Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF: avoid mutating Series._values directly in setitem but defer to Manager method #41879

Conversation

jorisvandenbossche
Copy link
Member

There are currently two cases where the Series code directly modifies the underlying values itself (self._values[..] = ..), while most setitem code paths end up going through a specific indexing method that all end up in the Manager doing the actual modification of the values.

Apart from being a bit inconsistent (it's good to have all eventual modifications consistently happening inside the manager, I think), I also need to move this to the manager for being able to check for copy-on-write in #41878.
Since I think this is a generally useful change, moving it out of the CoW PR.

@jorisvandenbossche jorisvandenbossche added Refactor Internal refactoring of code Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 8, 2021
pandas/core/series.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member

I lean towards using a new method instead of overloading setitem, which in general can cast.

@jorisvandenbossche
Copy link
Member Author

let's try not to overload setitem (e.g. its always inplace), so not in place is odd.

@jreback to be clear, it's the existing manager setitem method that is not inplace (the odd thing), and I am adding a variant that is inplace .

I lean towards using a new method instead of overloading setitem, which in general can cast.

I can certainly add a new method (eg setitem_inplace(..)) instead of adding a keyword (setitem(.., inplace=True)), no problem. But just note that this is really doing the same operation. It's just that in some cases, we know that the setitem will in practice be inplace, and we can be stricter in that case by not going through the general Block method that indeed can cast (that optimization was now done inside Series, and this moves it to the manager). Given that it is about the same operation but with different semantics, I think a keyword actually makes sense.

@jorisvandenbossche
Copy link
Member Author

The last commits adds the version with a separate method instead of keyword. Either way is fine for me (it's also easy to revert)

@jbrockmendel
Copy link
Member

BTW im picky on the implementation, but in general i think this is a very good idea

@jorisvandenbossche
Copy link
Member Author

Any further response here?

@jreback
Copy link
Contributor

jreback commented Jun 11, 2021

i will look at some point

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@jorisvandenbossche
Copy link
Member Author

This is not stale, but waiting on further comments.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me. can you rebase and small comments.

@@ -1287,8 +1287,24 @@ def apply(self, func, **kwargs):
return type(self)([new_array], self._axes)

def setitem(self, indexer, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type annotate the return values at the very least on these (Any is fine) just indicative

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an existing method, all other Manager methods with general indexer argument are not yet typed, so I would prefer to leave that for a typing-specific PR (I also thought that we generally want to avoid adding Any for "not yet typed" unless it's really "any").

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise lgtm.

@@ -1287,8 +1287,24 @@ def apply(self, func, **kwargs):
return type(self)([new_array], self._axes)

def setitem(self, indexer, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk

@@ -159,6 +159,18 @@ def array(self):
"""
return self.arrays[0] # type: ignore[attr-defined]

def setitem_inplace(self, indexer, value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type the output here (e.g. -> None)

@jreback jreback added this to the 1.4 milestone Aug 4, 2021
@jreback
Copy link
Contributor

jreback commented Aug 5, 2021

i think the failures are related here, e.g. the stacklevel is changed. so prob need to use find_stack_level instead of hard-coding for a lot of the indexing tests.

@jreback
Copy link
Contributor

jreback commented Aug 5, 2021

i think the failures are related here, e.g. the stacklevel is changed. so prob need to use find_stack_level instead of hard-coding for a lot of the indexing tests.

actually these are likley #42857

but this PR may affect as well (so let's fix that one first)

Copy link
Member

@jbrockmendel jbrockmendel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jorisvandenbossche jorisvandenbossche merged commit 3d135f3 into pandas-dev:master Aug 7, 2021
@jorisvandenbossche jorisvandenbossche deleted the ref-indexing-series-setitem-inplace branch August 7, 2021 16:32
feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants