PERF: Performance improvement on dataframe.update. #47407

warm200 · 2022-06-17T15:33:49Z

the setitem under the hood is doing df copy per #46267
therefore, the performance is compromised, per the suggestion, using loc for assignment instead
see the detail example in the issue link for metrics.

closes PERF: DataFrame.update of pandas version is slower than the older version #47392 (Replace xxxx with the Github issue number)

~~Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature~~

All code checks passed.

~~Added type annotations to new arguments/methods/functions.~~
~~Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.~~

…opy under the hood and it's not updated in place

warm200 · 2022-06-17T17:11:09Z

@phofl would you mind taking look at this PR

phofl · 2022-06-18T15:24:45Z

pandas/core/frame.py

@@ -8000,7 +8000,7 @@ def update(
            if mask.all():
                continue

-            self[col] = expressions.where(mask, this, that)
+            self.loc[:, col] = expressions.where(mask, this, that)


Since this writes into the underlying array, this is a change in behavior. Not sure if this is desireable.

@phofl Thanks for replying. per docstring under update that's what it's doing.
Modify in place using non-NA values from another DataFrame. Aligns on indices. There is no return value.
it's just not doing in an efficient manner, since after 1.4.0 the setitem is being used for assignment, this needs to be optimized.
do you mind tagging other core contributor who you may think relavant to this code to take a look at this.
I really think this should be the way doing it.

@phofl is right
this is an api change and cannot be back ported
nor is the right way

repeated updates are not idiomatic

@jreback thanks for replying, I don't want to just jump in and change the API fundamentally but I want to understand you. the thing I am trying to fix is to speed up the operation. but didn't mean to have other impacts.
to me, df[col] = "a" vs df.loc[:, col]="a" doesn't have much difference but the latter is much faster. this is only thing I meant to change. as regards to repeated updates are not idiomatic, i think this is what the existing code is doing(in a for loop), and since docstring says it is to update it in place, isn't the change just right on its purpose? but the df[col] = "a" is doing copy under the hood and it also slow it down.

Would like to hear you thoughts.

simonjayhawkins · 2022-06-21T18:17:37Z

@warm200 Thanks for the PR. #47327 has been merged which includes the change in this PR so closing.

PERF: Performance improvement on dataframe.update. setitem is doing c…

30d3215

…opy under the hood and it's not updated in place

phofl reviewed Jun 18, 2022

View reviewed changes

warm200 requested a review from jreback June 20, 2022 03:16

simonjayhawkins mentioned this pull request Jun 21, 2022

REGR: Fix fillna making a copy when dict was given as fill value and inplace is set #47327

Merged

5 tasks

simonjayhawkins closed this Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: Performance improvement on dataframe.update. #47407

PERF: Performance improvement on dataframe.update. #47407

warm200 commented Jun 17, 2022 •

edited

Loading

warm200 commented Jun 17, 2022 •

edited

Loading

phofl Jun 18, 2022

warm200 Jun 19, 2022 •

edited

Loading

jreback Jun 19, 2022

jreback Jun 19, 2022

warm200 Jun 20, 2022 •

edited

Loading

simonjayhawkins commented Jun 21, 2022

PERF: Performance improvement on dataframe.update. #47407

PERF: Performance improvement on dataframe.update. #47407

Conversation

warm200 commented Jun 17, 2022 • edited Loading

warm200 commented Jun 17, 2022 • edited Loading

phofl Jun 18, 2022

Choose a reason for hiding this comment

warm200 Jun 19, 2022 • edited Loading

Choose a reason for hiding this comment

jreback Jun 19, 2022

Choose a reason for hiding this comment

jreback Jun 19, 2022

Choose a reason for hiding this comment

warm200 Jun 20, 2022 • edited Loading

Choose a reason for hiding this comment

simonjayhawkins commented Jun 21, 2022

warm200 commented Jun 17, 2022 •

edited

Loading

warm200 commented Jun 17, 2022 •

edited

Loading

warm200 Jun 19, 2022 •

edited

Loading

warm200 Jun 20, 2022 •

edited

Loading