-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv's na_values dict format cannot parse float type #12224
Comments
Tried the errored call: In [4]: co2 = pd.read_csv("ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_m
lo.txt",
...: comment = "#", delim_whitespace = True,
...: names = ["year", "month", "decimal_date", "average", "in
terpolated", "trend", "days"],
...: na_values = {"decimal_date" : -99.99, "days" : -1})
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
....
truncated by bran
....
pandas\io\parsers.py in _clean_na_values(na_values,
keep_default_na)
2210 if keep_default_na:
2211 for k, v in compat.iteritems(na_values):
-> 2212 v = set(list(v)) | _NA_VALUES
2213 na_values[k] = v
2214 na_fvalues = dict([
TypeError: 'int' object is not iterable Seems that read_csv assumes the values in the na_values dict (something like{key:values}) must be an iterable. Current docstring for na_values
I'd be happy to submit a PR for this. |
yeah this is an inconsistency. separately I don't recall why we are defined (different!) |
Here's a nice reproducible example: >>> from pandas import read_csv
>>> from pandas.compat import StringIO
>>> data = '1,2\n2,1'
>>> read_csv(StringIO(data), names=['a', 'b'], na_values={'a': 2, 'b': 1})
...
TypeError: 'int' object is not iterable Note that this works, however: >>> read_csv(StringIO(data), names=['a', 'b'], na_values=1)
a b
0 NaN 2.0
1 2.0 NaN |
this works if u pass a dict of lists |
@jreback : Agreed, but if we accept scalars, we should accept them in the |
Update documentation to state that scalars are accepted for na_values. In addition, accept scalars for the values when a dictionary is passed in for na_values. Closes pandas-devgh-12224.
Update documentation to state that scalars are accepted for na_values. In addition, accept scalars for the values when a dictionary is passed in for na_values. Closes gh-12224.
I'm confused - I have 0.22.0 but I still get the "not iterable" error if I pass a dict for |
Can you provide an example to reproduce this? |
Yes - see my other comment at the bottom of this issue: #1657 When I issue the dict version of the read_csv from that comment, but remove the quotes from one or more of the numeric values, such that this code
changes to (say) this code:
then I immediately get the "not iterable" error. |
@neilser : How odd! This indeed slipped through the cracks on this one. Please open another issue with this example and your multiple attempts to read it. There is definitely something buggy. |
Sure, will create a new issue when I get a moment (probly tomorrow). Thanks for the feedback :-) |
Minor issue regarding
read_csv
'sna_values
argument indict
format. I note that the list format works fine when the NA value is given as a float-type (which is often the intuitive choice), e.g.:However, the dict format is more appropriate for this classic data set, since different columns are defining different NA values. Unfortunately, this fails with an error about float type:
and the NA value must be given as a string; which feels all kinds of wrong here:
Thanks for all the
pandas
awesomeness,The text was updated successfully, but these errors were encountered: