You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).
It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)
data = """YEAR, DOY, a
2001,106380451,10
2001,,11
2001,106380451,67"""
import numpy as np
f = pandas.read_csv(StringIO(data), sep=",", dtype={'DOY': np.int64})
f.dtypes
YEAR int64
DOY float64
a int64
The text was updated successfully, but these errors were encountered:
I don't mind that it is not possible (yet) but that read_csv changed the datatype even as I specified it and didn't say anything (throw exception or print warning).
Would it be posible to add a param to specify a strategy (drop row, throw exception, cast to float) what should happen with such cases? I tried to understand the code and it seems that it operates on columns, so dropping rows if an int is NA seems not an easy option :-(
Now that we have nullable integers since 0.24.0, wouldn't it be a good idea to add a parameter to read_csv like 'use_nullable_ints' to enable inference of Int64 columns?
Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).
It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)
The text was updated successfully, but these errors were encountered: