Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add inplace option to DataFrame.update() #10730

Closed
dov opened this issue Aug 3, 2015 · 9 comments
Closed

Add inplace option to DataFrame.update() #10730

dov opened this issue Aug 3, 2015 · 9 comments
Labels
Enhancement inplace Relating to inplace parameter or equivalent Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action

Comments

@dov
Copy link

dov commented Aug 3, 2015

Currently DataFrame.update() modifies the dataframe in place. I feel this is not symmetric to how most other methods work, e.g. set_index(), that returns a new DataFrame. It is probably too late to change this default behavior (isi it?), but if at the very least an inplace option was added (with default value True to support the current behaviour), it will be easier to chain a sequence of operators that includes update().

@jreback
Copy link
Contributor

jreback commented Aug 3, 2015

can you show an example of your use case. .update has a very limited usecase, often .combine/.combine_first/.assign can substitue.

As far as the default, this ship has sailed, would be better to introduce a .updated that defaults to returning a new object (and has an inplace parameter), and deprecate .update. Would take a PR for that.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design labels Aug 3, 2015
@jreback jreback added this to the Next Major Release milestone Aug 3, 2015
@jreback jreback added the Needs Discussion Requires discussion from core team before further action label Aug 3, 2015
@dov
Copy link
Author

dov commented Aug 3, 2015

I first encountered update as an answer to my stackoverflow question:

http://stackoverflow.com/questions/31769754/looking-up-multiple-values-from-a-pandas-dataframe/31779537#31779537

regarding how to a "parallell lookup" from all rows of one DataFrame by the values of another DataFrame.

In that example it might be useful to directly after the call to update() e.g. write the resulting dataframe to a csv file.

@Safrone
Copy link

Safrone commented Apr 24, 2019

I had a similar feeling. Is there a way to do this in a more functional way?

@giuliobeseghi
Copy link

can you show an example of your use case. .update has a very limited usecase, often .combine/.combine_first/.assign can substitue.

As far as the default, this ship has sailed, would be better to introduce a .updated that defaults to returning a new object (and has an inplace parameter), and deprecate .update. Would take a PR for that.

My use case is:

# combine three datetime series referring to periods that partially overlap
overall_forecast = long_term_forecast.update(medium_term_forecast).update(short_term_forecast)

I can't use loc because the index of a series is not necessarily contained in another

@Liam3851
Copy link
Contributor

@giuliobeseghi If I understand correctly, it sounds like your use case would be solved by using combine_first instead of update?

Combine Series values, choosing the calling Series's values
first. Result index will be the union of the two indexes

Parameters
----------
other : Series

@giuliobeseghi
Copy link

@giuliobeseghi If I understand correctly, it sounds like your use case would be solved by using combine_first instead of update?

Combine Series values, choosing the calling Series's values
first. Result index will be the union of the two indexes

Parameters
----------
other : Series

You are right.

Thanks for that, I got confused with the example: I might have to keep the nans of the most recent series. I realised I can do with reindex + loc:

import pandas as pd
import numpy as np

s = pd.Series([1, 2, 3, 4, 5])
t = pd.Series([10, 11, 12], index=range(3, 6))
t.iloc[1] = np.nan
print(t)

res = s.reindex(t.index | s.index)
res.loc[t.index] = t.to_numpy()
print(res)

It would be good if combine_first had an option to overwrite values even if thery're np.nan :)

@giuliobeseghi
Copy link

Shall we suggest combine_first in the docs of Series.update and DataFrame.update?

"For non-inplace update consider combine_first"

@mroeschke mroeschke added Enhancement and removed API Design Indexing Related to indexing on series/frames, not to indexes themselves labels Apr 18, 2021
@jbrockmendel jbrockmendel added the inplace Relating to inplace parameter or equivalent label Oct 29, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@peterhadlaw
Copy link

peterhadlaw commented Mar 30, 2023

combine_first only works when the original DataFrame has null values to be ... updated. update however will overwrite whatever the original values were at a given index regardless if it was null or not (and more importantly, leave values intact from the starting DF if their index was not in the updating DF).

I think there is no way to do this sort of pick the values from a new DataFrame in a functional manner

I guess maybe .combine(other, lambda a, b: b) would accomplish this? but update directly feels more elegant and appropriate

@mroeschke
Copy link
Member

I think with the more active discussion about deprecating/retooling inplace parameters across pandas, going to close this in favor of that #16529

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement inplace Relating to inplace parameter or equivalent Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

8 participants