Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parsing thousands separators in floating point data #2594

Closed
wesm opened this issue Dec 24, 2012 · 3 comments
Closed

Support parsing thousands separators in floating point data #2594

wesm opened this issue Dec 24, 2012 · 3 comments
Labels
Bug Enhancement IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@wesm
Copy link
Member

wesm commented Dec 24, 2012

xref #584

It seems that the decimal format works ok for the decimal sign or for the thousands but not combined.
Reopen the issue?

Example

import pandas as pd
from StringIO import StringIO
data = """A;B;C
0;0,11;0,11
1.000;1000,11;1.000,11
20.000;20000,22;20.000,22
300.000;300000,33;300.000,33
4.000.000;4000000,44;4.000.000,44
5.000.000.000;5000000000,55;5.000.000.000,55"""

df = pd.read_csv(StringIO(data), sep=';', thousands='.', decimal =',')
print df.dtypes
print df

Results in

A int64
B float64
C object
A B C
0 0 1.100000e-01 0,11
1 1000 1.000110e+03 1.000,11
2 20000 2.000022e+04 20.000,22
3 300000 3.000003e+05 300.000,33
4 4000000 4.000000e+06 4.000.000,44
5 5000000000 5.000000e+09 5.000.000.000,55
@matthias-ollig
Copy link

I wrote a converter that removes the thousand separator and tried to use that in combination with the dtype argument of read_csv without success. What does work though, is removing and casting at the same time:

# in your case you want to replace the dot as that is your thousand separator
rem_thousand_sep_and_cast_to_float = lambda x: pd.np.float(x.replace(",", "")) 

You can then use that function to convert the desired columns with the converters argument of read_csv. Let me know if that works for you.

Used in an example:

df = pd.io.parsers.read_csv("my.csv", sep=",", thousands=",",
                            converters={"a": rem_thousand_sep_and_cast_to_float,
                                        "b": rem_thousand_sep_and_cast_to_float})

@jreback
Copy link
Contributor

jreback commented Jul 31, 2013

@hayd
Copy link
Contributor

hayd commented Aug 26, 2013

fixed by #4598

@hayd hayd closed this as completed Aug 26, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Enhancement IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

4 participants