read_csv, integer dtype and empty cells #2631

jankatins · 2013-01-03T22:26:01Z

Reading in a csv file with an integer column which has empty cells will cast that column to float (which in the end will resulted in problems with merging this dataframe on that column with a dataframe where the corresponding column is int).

It would be nice if a warning could be printed when such conversation (maybe only when an explicit dtype={"col":np.int64} setting is passed to read_csv) takes place and optional let me specify that such rows should be droped (isn't there a NA value for int columns...?)

data = """YEAR, DOY, a
2001,106380451,10
2001,,11
2001,106380451,67"""
import numpy as np
f = pandas.read_csv(StringIO(data), sep=",", dtype={'DOY': np.int64})
f.dtypes

YEAR      int64
 DOY    float64
 a        int64

The text was updated successfully, but these errors were encountered:

wesm · 2013-01-03T22:27:35Z

There is no integer NA values unfortunately. I plan to fix this (a big project-- requires circumventing NumPy probably) one of these days

jankatins · 2013-01-03T23:37:19Z

I don't mind that it is not possible (yet) but that read_csv changed the datatype even as I specified it and didn't say anything (throw exception or print warning).

pandas/src/pasrer.pyx has commented out exception throwing in line 900, which seems to do what I expected...?

Would it be posible to add a param to specify a strategy (drop row, throw exception, cast to float) what should happen with such cases? I tried to understand the code and it seems that it operates on columns, so dropping rows if an int is NA seems not an easy option :-(

wesm · 2013-01-20T00:11:33Z

Done. Thanks for the suggestion; I agree raising the exception is the right move. in your example note you need to pass skipinitialspace=True

StefRe · 2019-10-05T18:49:17Z

Now that we have nullable integers since 0.24.0, wouldn't it be a good idea to add a parameter to read_csv like 'use_nullable_ints' to enable inference of Int64 columns?

ghost assigned wesm Jan 20, 2013

wesm closed this as completed in 5da8df7 Jan 20, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv, integer dtype and empty cells #2631

read_csv, integer dtype and empty cells #2631

jankatins commented Jan 3, 2013

wesm commented Jan 3, 2013

jankatins commented Jan 3, 2013

wesm commented Jan 20, 2013

StefRe commented Oct 5, 2019

read_csv, integer dtype and empty cells #2631

read_csv, integer dtype and empty cells #2631

Comments

jankatins commented Jan 3, 2013

wesm commented Jan 3, 2013

jankatins commented Jan 3, 2013

wesm commented Jan 20, 2013

StefRe commented Oct 5, 2019