BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

ayhanfuat · 2018-04-08T10:51:42Z

df = pd.DataFrame({'A': ['a', 'b'], 'B': ['1', '2'], 'C': ['3', '4']})  
df.loc[:, ['B', 'C']] = df.loc[:, ['B', 'C']].astype('int')
df.dtypes

A    object
B    object
C    object
dtype: object

When I try to update multiple object columns with loc/iloc, the values in the columns change but object dtype is preserved. This is not the case for numeric dtypes.

df = pd.DataFrame({'A': ['a', 'b'], 'B': [1, 2], 'C': [3, 4]})
df.loc[:, ['B', 'C']] = df.loc[:, ['B', 'C']].astype('float')
df.dtypes

A     object
B    float64
C    float64
dtype: object

Shouldn't the columns in the first example have integer dtypes? I found this issue but it seems it is specific to extension arrays. Also, if I try it with a single column like the one in the linked issue, the dtype changes:

df = pd.DataFrame({'A': ['a', 'b'], 'B': ['1', '2'], 'C': ['3', '4']})
df.loc[:, 'B'] = df.loc[:, 'B'].astype('int')
df.dtypes

A    object
B     int64
C    object
dtype: object

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.10.0-42-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.4.1
pip: 9.0.2
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.7.1
patsy: 0.5.0
dateutil: 2.7.0
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.5.0
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.4
pymysql: None
psycopg2: 2.7.4 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2018-04-09T15:09:19Z

the first example should be int. this is a bug. if you'd like to have a look would be great.

designMoreWeb · 2019-02-26T20:34:50Z

In my testing,

I have been gettting this bug when the starting dataframe is all strings.
Any other type have not given me any issues

`import pandas as pd

print('DF2 as Data Frame')
print("Starting set are all floats")
print('-----------------')
df2=pd.DataFrame({'L':[0.0,2.5],'M':[3.5,8.5],'N':[9.6,10.0]})
print(df2.dtypes)
print('----------------')
df2.loc[:, ['L', 'M']] = df2.loc[:, ['L', 'M']].astype('str')
print(df2.dtypes)
print('----------------')
df2.loc[:, 'M'] =df2.loc[:, 'M'].astype('str')
print(df2.dtypes)
print('----------------')
df2.loc[:, ['L', 'N']] = df2.loc[:, ['L', 'N']].astype('str')
print(df2.dtypes)
print('----------------')
print('----------------')

print('DF as Data Frame')
print("Starting set are all ints")
print('----------------')
df = pd.DataFrame({'D':[2,3],'E':[4,5],'F':[8,9]})
print(df.dtypes)
print('----------------')
df.loc[:, ['E', 'F']] = df.loc[:, ['E', 'F']].astype('float')
print(df.dtypes)
print('----------------')
df.loc[:, 'E'] =df.loc[:, 'E'].astype('int')
print(df.dtypes)
print('----------------')
df.loc[:, ['D', 'F']] = df.loc[:, ['D', 'F']].astype('int')
print(df.dtypes)
print('----------------')
print('----------------')

print('DF3 as Data Frame')
print("Starting set are all str")
print('----------------')
df3 = pd.DataFrame({'J':['2','3'],'K':['4','5'],'G':['8','9']})
print(df3.dtypes)
print('----------------')
df3.loc[:, ['J', 'G']] = df3.loc[:, ['J', 'G']].astype('int')
print(df3.dtypes)
print('----------------')
df3.loc[:, 'J'] =df3.loc[:, 'J'].astype('int')
print(df3.dtypes)
print('----------------')
df3.loc[:, ['K', 'G']] = df3.loc[:, ['K', 'G']].astype('float')
print(df3.dtypes)
print('----------------')
print('----------------')

print('DF4 as Data Frame')
print("Starting set are a combination of floats and ints")
print('----------------')
df4 = pd.DataFrame({'X':[2,3.2],'Y':[4.5,5.5],'Z':[8,9]})
print(df4.dtypes)
print('----------------')
df4.loc[:, ['X', 'Y']] = df4.loc[:, ['X', 'Y']].astype('int')
print(df4.dtypes)
print('----------------')
df4.loc[:, 'Z'] =df4.loc[:, 'Z'].astype('float')
print(df4.dtypes)
print('----------------')
df4.loc[:, ['X', 'Z']] = df4.loc[:, ['X', 'Z']].astype('int')
print(df4.dtypes)
print('----------------')
print('----------------')
`

results

DF2as Data Frame
Starting set are all floats

L float64
M float64
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N object
dtype: object

DF as Data Frame
Starting set are all ints

D int64
E int64
F int64
dtype: object

D int64
E float64
F float64
dtype: object

D int64
E int64
F float64
dtype: object

D int64
E int64
F int64
dtype: object

DF3 as Data Frame
Starting set are all str

G object
J object
K object
dtype: object

G object
J object
K object
dtype: object

G object
J int64
K object
dtype: object

G float64
J int64
K float64
dtype: object

DF4 as Data Frame
Starting set are a combination of floats and ints

X float64
Y float64
Z int64
dtype: object

X int64
Y int64
Z int64
dtype: object

X int64
Y int64
Z float64
dtype: object

X int64
Y int64
Z int64
dtype: object

[Done] exited with code=0 in 1.637 seconds

The issue does not occur when they are all either ints, floats or a combination of ints and floats. This occurs because strings are considered objects by python and the int and floats are considered as “numeric” objects. So what is happening is that when we are trying to convert the strings to any of the numeric object type it creates a temp and then when we try the conversion again it is converting the strings to numeric types.

Could be related with this issue #11617

jbrockmendel · 2022-01-08T20:46:35Z

similar to #24269

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves Difficulty Intermediate labels Apr 9, 2018

jreback added this to the Next Major Release milestone Apr 9, 2018

jreback changed the title ~~loc and iloc do not change object dtype~~ BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype Apr 9, 2018

Rik-de-Kort mentioned this issue Apr 20, 2018

Dataframe column filled with .sum() of int-series has dtype not int64, but object #20754

Closed

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

phofl mentioned this issue Nov 10, 2020

BUG: Bug in loc did not change dtype when complete column was assigned #37749

Closed

8 tasks

jbrockmendel mentioned this issue Jan 6, 2022

BUG/API: DataFrame.iloc[:, foo] = bar inplaceness? #44353

Closed

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

ayhanfuat commented Apr 8, 2018

INSTALLED VERSIONS

jreback commented Apr 9, 2018

designMoreWeb commented Feb 26, 2019 •

edited

Loading

L float64
M float64
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N object
dtype: object

DF as Data Frame
Starting set are all ints

D int64
E int64
F int64
dtype: object

D int64
E float64
F float64
dtype: object

D int64
E int64
F float64
dtype: object

D int64
E int64
F int64
dtype: object

DF3 as Data Frame
Starting set are all str

G object
J object
K object
dtype: object

G object
J object
K object
dtype: object

G object
J int64
K object
dtype: object

G float64
J int64
K float64
dtype: object

DF4 as Data Frame
Starting set are a combination of floats and ints

X float64
Y float64
Z int64
dtype: object

X int64
Y int64
Z int64
dtype: object

X int64
Y int64
Z float64
dtype: object

X int64
Y int64
Z int64
dtype: object

jbrockmendel commented Jan 8, 2022

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

BUG: indexing with loc and iloc with list-likes and new dtypes do not change from object dtype #20635

Comments

ayhanfuat commented Apr 8, 2018

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Apr 9, 2018

designMoreWeb commented Feb 26, 2019 • edited Loading

L float64 M float64 N float64 dtype: object

L object M object N float64 dtype: object

L object M object N float64 dtype: object

L object M object N object dtype: object

DF as Data Frame Starting set are all ints

D int64 E int64 F int64 dtype: object

D int64 E float64 F float64 dtype: object

D int64 E int64 F float64 dtype: object

D int64 E int64 F int64 dtype: object

DF3 as Data Frame Starting set are all str

G object J object K object dtype: object

G object J object K object dtype: object

G object J int64 K object dtype: object

G float64 J int64 K float64 dtype: object

DF4 as Data Frame Starting set are a combination of floats and ints

X float64 Y float64 Z int64 dtype: object

X int64 Y int64 Z int64 dtype: object

X int64 Y int64 Z float64 dtype: object

X int64 Y int64 Z int64 dtype: object

jbrockmendel commented Jan 8, 2022

Output of `pd.show_versions()`

designMoreWeb commented Feb 26, 2019 •

edited

Loading

L float64
M float64
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N float64
dtype: object

L object
M object
N object
dtype: object

DF as Data Frame
Starting set are all ints

D int64
E int64
F int64
dtype: object

D int64
E float64
F float64
dtype: object

D int64
E int64
F float64
dtype: object

D int64
E int64
F int64
dtype: object

DF3 as Data Frame
Starting set are all str

G object
J object
K object
dtype: object

G object
J object
K object
dtype: object

G object
J int64
K object
dtype: object

G float64
J int64
K float64
dtype: object

DF4 as Data Frame
Starting set are a combination of floats and ints

X float64
Y float64
Z int64
dtype: object

X int64
Y int64
Z int64
dtype: object

X int64
Y int64
Z float64
dtype: object

X int64
Y int64
Z int64
dtype: object