-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] Array version of putmask logic #44396
[ArrayManager] Array version of putmask logic #44396
Conversation
can the fallback logic stay in internals? maybe internals.methods? there are a few funcs in internals.blocks that might go there too. |
mask=mask, | ||
new=new, | ||
) | ||
kwargs = {"mask": mask, "new": new} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not feasible to go through 'apply'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. So I forgot that ArrayManager.apply
indeed already has this "align" logic as well. However, that is not used at the moment (only putmask
and where
define align keys, and those both still go through apply_with_block
at the moment; this is also confirmed by codecov that it is unused).
But then I would rather remove it from apply
, instead of going through apply
, which will also prevent duplication. The reason for doing it in putmask: 1) apply
is already quite complex (eg also dealing with ignoring failures), so being able to remove the "alignment" handling there would be nice (can still share this later with where
), and 2) putmask
can update the existing arrays, while apply
always constructs a new manager, and 3) for the CoW branch I need to add copy-on-write logic to putmask
, which is not needed for apply
in general.
Yes, either way is fine for me. Happy to move it into internals. |
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray | ||
|
||
|
||
def putmask_flexible(array: np.ndarray | ExtensionArray, mask, new): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is all of this logic from? this doesn't look like 'simple' wrappers to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's logic that for the BlockManager lives on the blocks (the Block.putmask version also calls those putmask_without_repeat
and putmask_smart
helpers from core.array_algos.putmask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great can u then remove from where it's used now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The BlockManager is still used ..
I know this gives some duplication, but the parts that can be shared are already in pandas.core.array_algos.putmask
. Both the Block version as this version are using those functions.
I can try to see if there are some more parts that could be moved into array_algos.putmask
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I could easily remove most of the logic of ExtensionBlock.putmask
and reuse the new helper function. For base Block.putmask
that will be more difficult, as that directly involves splitting blocks if needed etc (so Block-specific logic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me put this another way. you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
No, it is only for use within array_manager.py
, but I put it in a separate internals/methods.py
file to not make array_manager.py
any longer (and on request of Brock I also didn't put it in /core/array_algos/putmask.py
because this is the custom putmask logic for internals (with fallback logic etc that we need for Series/DataFrame), and not a general putmask algo)
@@ -190,7 +191,7 @@ def __repr__(self) -> str: | |||
def apply( | |||
self: T, | |||
f, | |||
align_keys: list[str] | None = None, | |||
align_keys: list[str] | None = None, # not used for ArrayManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not worth having a separate one for AM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean exactly? Separate signature for ArrayManager?
I initially remove this from the signature (since it is not used), but we have this method as an abstract method on the base class manager, and thus it gives typing errors if I remove it here.
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray | ||
|
||
|
||
def putmask_flexible(array: np.ndarray | ExtensionArray, mask, new): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me put this another way. you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere?
Can you elaborate on this a bit? If we're going to implement CoW for BM anyway, some of the duplication might be avoidable. |
See the 3 lines at https://github.com/pandas-dev/pandas/pull/41878/files#diff-b8c89656832340e9986eeee40df66eb4282555045a257e7a489d98b21b0adfbeR390-R393 that I currently added in that PR. Since putmask is used in indexing, I need to check for a copy-on-write (and this is not needed for
I think that in general the CoW detection on BlockManager will be a separate implementation (but of course difficult to say before actually trying). For example the simplest way is to do this on a block-by-block basis (which would already give different code as the 3 lines I referenced). |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
I increasingly think that this type of change should be left until right before BlockManager is ripped out. Until then, this just duplicates code and makes it easy for behavior to accidentally drift apart. |
can you merge main |
closing as stale |
xref #39146
Trying to remove the usage of
apply_with_block
for putmask.(I also need to this for the Copy-on-Write branch to be able to perform the copy on write in putmask)