Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: inplace behavior is inconsist for fillna #47188

Closed
3 tasks done
Yikun opened this issue Jun 1, 2022 · 11 comments · Fixed by #47327
Closed
3 tasks done

BUG: inplace behavior is inconsist for fillna #47188

Yikun opened this issue Jun 1, 2022 · 11 comments · Fixed by #47327
Labels
Bug Copy / view semantics Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@Yikun
Copy link
Contributor

Yikun commented Jun 1, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
pdf = pd.DataFrame({"x": [np.nan, 2, 3, 4, np.nan, 6], "y": [np.nan, 2, 3, 4, np.nan, 6]})
pser = pdf.x
# Didn't make a copy write to existing  (it's unexpected behavior)
pdf.fillna(0, inplace=True)
>>> pser
0    0.0
1    2.0
2    3.0
3    4.0
4    0.0
5    6.0


import pandas as pd
import numpy as np
pdf = pd.DataFrame({"x": [np.nan, 2, 3, 4, np.nan, 6], "y": [np.nan, 2, 3, 4, np.nan, 6]})
pser = pdf.x
# Make a copy and doesn't write to existing (it's expected behavior)
>>> pdf.fillna({"x": -1, "y": -2}, inplace=True)
>>> pser
0    NaN
1    2.0
2    3.0
3    4.0
4    NaN
5    6.0
Name: x, dtype: float64

Issue Description

Looks like behavior is a little bit differet in current pandas.

Expected Behavior

Always make a copy and doesn't write to existing array for Do in-place operation in all functions with inplace (such like eval/update/fillna).

Installed Versions

1.4+

@Yikun Yikun added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 1, 2022
@Yikun Yikun changed the title BUG: inplace behavior BUG: inplace behavior is inconsist for fillna Jun 1, 2022
@Yikun
Copy link
Contributor Author

Yikun commented Jun 1, 2022

This change is related to 03dd698 . As #46800 (comment) mentioned: Setitem should always make a copy and never write into the existing array.. This behavior change also has some influence on eval/update/fillna with place/setitem method.

I assumed eval/update/fillna with place/setitem has same behavior will always make a copy like setitem, but looks like it's a little bit inconsist from the PR example.

@Yikun
Copy link
Contributor Author

Yikun commented Jun 1, 2022

cc @jbrockmendel @phofl

@jbrockmendel
Copy link
Member

Looks like at the end of Block.fillna adding a if inplace: return nbs before the return extend_blocks( fixes this. Haven't run the test suite with this edit.

@Yikun
Copy link
Contributor Author

Yikun commented Jun 2, 2022

@jbrockmendel Thanks for quick feedback!

IIUC, do you mean only setitem should return new copy df?

# should make a copy
pdf["a"] = pdf["a"] + 10

But for method like eval/update/fillna, should operate inplace rather than creating a new copy df?

# operate inplace
df.fillna({"x": -1, "y": -2, "z": -5}, inplace=True)
df.update(new_df)
df.eval("A = B + C", inplace=True)

@simonjayhawkins
Copy link
Member

based on the title of this issue, i'm labelling as a regression and milestoning 1.4.x since both operations in the OP were consistent in pandas-1.3.5.

of course, if the change for the second case is intentional and the first case is a bug that was not fixed by #43406, then could also choose not to backport any fix.

if the second is considered the bug, then any fix should definitely be backported to restore the 1.3.5 behavior

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Copy / view semantics and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 2, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone Jun 2, 2022
@Yikun
Copy link
Contributor Author

Yikun commented Jun 7, 2022

@simonjayhawkins Thanks for reply! It seems the fillnan, update, eval behavior changes are accidently.

Besides the code of fillnan in PR, below is the test code for update and eval:

import pandas as pd
import numpy as np

# Example for update
pdf = pd.DataFrame({"A": ["1", "2", "3", "4"], "B": ["100", "200", np.nan, np.nan]}, columns=["A", "B"])
new_pdf = pd.DataFrame({"B": ["x", np.nan, "y", np.nan], "C": ["100", "200", "300", "400"]},columns=["B", "C"])
pser = pdf.B
pdf.update(new_pdf)
# pser != pdf.B after 1.4

# Example for eval
pdf = pd.DataFrame({"A": range(1, 6), "B": range(10, 0, -2), "C": range(11, 16)})
pser = pdf.A
pdf.eval("A = B + C", inplace=True)
# pser != pdf.A after 1.4

@Yikun
Copy link
Contributor Author

Yikun commented Jun 13, 2022

@simonjayhawkins @jbrockmendel Would this issue be fixed in 1.4.3?

What should we really expect the behavior to be? According to history comments and discussion, I assume #47188 (comment) , eval/update/fillna should operate in-place, right?

@phofl
Copy link
Member

phofl commented Jun 13, 2022

I think updating inplace is sensible here? This is a subtle change so I think we should backport, because the fact that we are using setitem under the hood should not leak into the interface. If we want to change this, we should deprecate, but only makes sens if we don't get Copy on Write in for 2.0

@Yikun
Copy link
Contributor Author

Yikun commented Jun 21, 2022

@phofl Thanks for explanation, I updated the PR description, looks like the PR only fix the fillna. If needed, I'd like to raise two separate issues for eval and update (#47188 (comment)).

@simonjayhawkins
Copy link
Member

I'd like to raise two separate issues for eval and update (#47188 (comment)).

fix for update included in #47327

@simonjayhawkins
Copy link
Member

I'd like to raise two separate issues for eval and update (#47188 (comment)).

opened #47449 for eval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Copy / view semantics Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants