You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When assigning via the loc parameter, I'm running into issues with using a string index. The example shows various attempts at using loc to assign with different indices: a single int column, a single string column, and a two-column index.
The variations of attempts are commented as:
for single column indexes, use a flat value (A) or a tuple (B). Multi-column indexes only use tuple (B).
use df.loc[df.index==value, column] (1) vs df.loc[value, column] (2)
Expected Output
I'd like to use a single variation for all index types (but it seems no single method works). Ideally, it would be 'B2', but that does not work for a string-based index.
I think you've identified two inconsistencies/likely bugs here.
Part of the issue is that tuples are valid members of a standard index, so df.loc['20-25', 'total'] and df.loc[('20-25',), 'total'] could potentially have different semantics. But it is indeed weird that df.loc[1, 'total'] and df.loc[(1,), 'total'] give the same result -- we shouldn't have different behavior for string vs numeric indexes.
To get the right result with a MultiIndex, you need to index either like df.loc['20-25', 'total'] or df.loc[('20-25', slice(None)), 'total'] (i.e., filling out all the trailing indexer levels). Unfortunately, you can't treat a non-MultiIndex like a single level MultiIndex -- as noted above, you'll need to unpack the tuple. This part of the larger issue of consistency between the MultiIndex and Index APIs (#3268).
One option would be to avoid using a MultiIndex for indexing at all, and stick with using a boolean indexer for the rows, e.g., df.loc[(df.size == 1) & (df.age == '20-25'), 'total'].
Code Sample, a copy-pastable example if possible
Problem description
When assigning via the loc parameter, I'm running into issues with using a string index. The example shows various attempts at using loc to assign with different indices: a single int column, a single string column, and a two-column index.
The variations of attempts are commented as:
df.loc[df.index==value, column]
(1) vsdf.loc[value, column]
(2)Expected Output
I'd like to use a single variation for all index types (but it seems no single method works). Ideally, it would be 'B2', but that does not work for a string-based index.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 3.7.2
pip: 18.0
setuptools: 40.2.0
Cython: None
numpy: 1.15.1
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.2.11
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: