Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Setting values on a multiindex df, via loc, does nothing on 1.4.x #46983

Open
2 of 3 tasks
ghost opened this issue May 10, 2022 · 6 comments
Open
2 of 3 tasks

BUG: Setting values on a multiindex df, via loc, does nothing on 1.4.x #46983

ghost opened this issue May 10, 2022 · 6 comments
Labels
Bug Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version Warnings Warnings that appear or should be added to pandas

Comments

@ghost
Copy link

ghost commented May 10, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
print("pandas version:", pd.__version__)

index = pd.MultiIndex.from_tuples(
    [("A", "a"), ("A", "b"), ("A", "c"), ("A", "d"), ("B", "a"), ("B", "b")]
)

df = pd.DataFrame(
    [[10, 100], [20, 200], [30, 300], [40, 400], [50, 500], [60, 600]],
    columns=["val1", "val2"],
    index=index,
)

df.loc['A']['val1'] = 1
print(df)

Issue Description

On Pandas 1.4.x (including 1.4.2), the example code does not modify df. The value 1 is not assigned.
It returns:

pandas version: 1.4.2
     val1  val2
A a    10   100
  b    20   200
  c    30   300
  d    40   400
B a    50   500
  b    60   600

(This bug could be linked to #46837)

Expected Behavior

on Pandas 1.3.5, the example returns:

pandas version: 1.3.5
     val1  val2
A a     1   100
  b     1   200
  c     1   300
  d     1   400
B a    50   500
  b    60   600

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.8.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 56.0.0
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.1
IPython : 8.2.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : 2.1.1
matplotlib : 3.5.1
numba : None
numexpr : 2.8.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
snappy : None
sqlalchemy : None
tables : 3.7.0
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@ghost ghost added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 10, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 16, 2022
@simonjayhawkins
Copy link
Member

Thanks @Kerybas for the report.

(This bug could be linked to #46837)

first bad commit: [03dd698] BUG: DataFrame.setitem sometimes operating inplace (#43406)

I think this was considered a bug, even though a long standing behavior.

I've labelled as a regression for now, pending further discussion.

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Copy / view semantics and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 16, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 16, 2022
@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Indexing Related to indexing on series/frames, not to indexes themselves labels May 16, 2022
@ghost
Copy link
Author

ghost commented May 16, 2022

Hi @simonjayhawkins,

Which one is considered the buggy behaviour?
what we have in 1.3.5 or what we have in 1.4?

As a user, I would expect the former behaviour, from 1.3.5, where .loc actually modifies values.
And if not, then I would at least expect a warning instead of no change and no feedback.

@phofl
Copy link
Member

phofl commented May 27, 2022

In general I would recommend to avoid chained indexing and use

df.loc['A', 'val1'] = 1

@simonjayhawkins
Copy link
Member

As a workaround, I would suggest using either

df.loc[pd.IndexSlice["A", :], "val1"] = 1

or

df.loc[("A",), "val1"] = 1

to make the code more robust if column names also appear in the levels of the MultiIndex.


Which one is considered the buggy behaviour?
what we have in 1.3.5 or what we have in 1.4?

both

As a user, I would expect the former behaviour, from 1.3.5, where .loc actually modifies values.
And if not, then I would at least expect a warning instead of no change and no feedback.

yes. this should raise a warning if we decide not to revert to the 1.3.5 behavior

@simonjayhawkins
Copy link
Member

moving to 1.4.4

@simonjayhawkins simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022
@simonjayhawkins
Copy link
Member

removing 1.4.x milestone

@simonjayhawkins simonjayhawkins removed this from the 1.4.4 milestone Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Copy / view semantics Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

No branches or pull requests

2 participants