Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: properly update DataFrame cache in Series.__setitem__ #48215

Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.loc` setting a length-1 array like value to a single value in the DataFrame (:issue:`46268`)
- Fixed regression when slicing with :meth:`DataFrame.loc` with :class:`DateOffset`-index (:issue:`46671`)
- Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
- Fixed regression in updating a DataFrame column through :meth:`Series.__setitem__` (:issue:`47172`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Series.__setitem__ doesn't render? so perhaps reword and maybe should also mention memory leak or that the values were not updated inplace using chained indexing with assignment?

- Fixed regression in :meth:`DataFrame.select_dtypes` returning a view on the original DataFrame (:issue:`48090`)
- Fixed regression using custom Index subclasses (for example, used in xarray) with :meth:`~DataFrame.reset_index` or :meth:`Index.insert` (:issue:`47071`)
- Fixed regression in :meth:`DatetimeIndex.intersection` when the :class:`DatetimeIndex` has dates crossing daylight savings time (:issue:`46702`)
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1169,7 +1169,7 @@ def __setitem__(self, key, value) -> None:
self._set_with(key, value)

if cacher_needs_updating:
self._maybe_update_cacher()
self._maybe_update_cacher(inplace=True)

def _set_with_engine(self, key, value) -> None:
loc = self.index.get_loc(key)
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -1235,3 +1235,21 @@ def test_setitem_not_operating_inplace(self, value, set_value, indexer):
view = df[:]
df[indexer] = set_value
tm.assert_frame_equal(view, expected)

@td.skip_array_manager_invalid_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this skips whereas skip_array_manager_not_yet_implemented xfails.

Now this test fails with AttributeError: 'ArrayManager' object has no attribute 'blocks' since we are explicitly testing the BlockManager values.

so it does make sense to just skip.

but are we likely to want a similar test for array manger?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it indeed makes sense to skip this test instead of xfail, since the way is written is not intended to ever work for ArrayManager. But so we could indeed write a separate test that checks for the the ArrayManager case (but I would say it's maybe not a super high priority, I can only take a look in the weekend).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I would say it's maybe not a super high priority

no problem. not a blocker.

def test_setitem_column_update_inplace(self, using_copy_on_write):
# https://github.com/pandas-dev/pandas/issues/47172

labels = [f"c{i}" for i in range(10)]
df = DataFrame({col: np.zeros(len(labels)) for col in labels}, index=labels)
values = df._mgr.blocks[0].values

for label in df.columns:
df[label][label] = 1

if not using_copy_on_write:
# diagonal values all updated
assert np.all(values[np.arange(10), np.arange(10)] == 1)
else:
# original dataframe not updated
assert np.all(values[np.arange(10), np.arange(10)] == 0)