Handle European decimal formats in parsers at a lower level #584

wesm · 2012-01-06T20:25:48Z

No description provided.

wesm · 2012-11-27T16:10:13Z

Done in new parser engine

keluc · 2012-12-18T20:27:31Z

It seems that the decimal format works ok for the decimal sign or for the thousands but not combined.
Reopen the issue?

Example

import pandas as pd
from StringIO import StringIO
data = """A;B;C
0;0,11;0,11
1.000;1000,11;1.000,11
20.000;20000,22;20.000,22
300.000;300000,33;300.000,33
4.000.000;4000000,44;4.000.000,44
5.000.000.000;5000000000,55;5.000.000.000,55"""

df = pd.read_csv(StringIO(data), sep=';', thousands='.', decimal =',')
print df.dtypes
print df

Results in

A int64
B float64
C object
A B C
0 0 1.100000e-01 0,11
1 1000 1.000110e+03 1.000,11
2 20000 2.000022e+04 20.000,22
3 300000 3.000003e+05 300.000,33
4 4000000 4.000000e+06 4.000.000,44
5 5000000000 5.000000e+09 5.000.000.000,55

wesm · 2012-12-24T17:29:56Z

I'll open a separate issues: currently thousands separators are not handled at all for floating point numbers

matthias-ollig · 2013-07-31T12:39:39Z

I wrote a converter that removes the thousand separator and tried to use that in combination with the dtype argument of read_csv without success. What does work though, is removing and casting at the same time:

# in your case you want to replace the dot as that is your thousand separator
rem_thousand_sep_and_cast_to_float = lambda x: pd.np.float(x.replace(",", ""))

You can then use that function to convert the desired columns with the converters argument of read_csv. Let me know if that works for you.

Used in an example:

df = pd.io.parsers.read_csv("my.csv", sep=",", thousands=",",
                            converters={"a": rem_thousand_sep_and_cast_to_float,
                                        "b": rem_thousand_sep_and_cast_to_float})

adamklein mentioned this issue Feb 15, 2012

Handle European decimal formats in to_csv #781

Closed

wesm closed this as completed Nov 27, 2012

wesm mentioned this issue Dec 24, 2012

Support parsing thousands separators in floating point data #2594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle European decimal formats in parsers at a lower level #584

Handle European decimal formats in parsers at a lower level #584

wesm commented Jan 6, 2012

wesm commented Nov 27, 2012

keluc commented Dec 18, 2012

wesm commented Dec 24, 2012

matthias-ollig commented Jul 31, 2013

Handle European decimal formats in parsers at a lower level #584

Handle European decimal formats in parsers at a lower level #584

Comments

wesm commented Jan 6, 2012

wesm commented Nov 27, 2012

keluc commented Dec 18, 2012

wesm commented Dec 24, 2012

matthias-ollig commented Jul 31, 2013