Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle European decimal formats in parsers at a lower level #584

Closed
wesm opened this issue Jan 6, 2012 · 4 comments
Closed

Handle European decimal formats in parsers at a lower level #584

wesm opened this issue Jan 6, 2012 · 4 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@wesm
Copy link
Member

wesm commented Jan 6, 2012

No description provided.

@wesm
Copy link
Member Author

wesm commented Nov 27, 2012

Done in new parser engine

@wesm wesm closed this as completed Nov 27, 2012
@keluc
Copy link

keluc commented Dec 18, 2012

It seems that the decimal format works ok for the decimal sign or for the thousands but not combined.
Reopen the issue?

Example

import pandas as pd
from StringIO import StringIO
data = """A;B;C
0;0,11;0,11
1.000;1000,11;1.000,11
20.000;20000,22;20.000,22
300.000;300000,33;300.000,33
4.000.000;4000000,44;4.000.000,44
5.000.000.000;5000000000,55;5.000.000.000,55"""

df = pd.read_csv(StringIO(data), sep=';', thousands='.', decimal =',')
print df.dtypes
print df

Results in

A int64
B float64
C object
A B C
0 0 1.100000e-01 0,11
1 1000 1.000110e+03 1.000,11
2 20000 2.000022e+04 20.000,22
3 300000 3.000003e+05 300.000,33
4 4000000 4.000000e+06 4.000.000,44
5 5000000000 5.000000e+09 5.000.000.000,55

@wesm
Copy link
Member Author

wesm commented Dec 24, 2012

I'll open a separate issues: currently thousands separators are not handled at all for floating point numbers

@matthias-ollig
Copy link

I wrote a converter that removes the thousand separator and tried to use that in combination with the dtype argument of read_csv without success. What does work though, is removing and casting at the same time:

# in your case you want to replace the dot as that is your thousand separator
rem_thousand_sep_and_cast_to_float = lambda x: pd.np.float(x.replace(",", "")) 

You can then use that function to convert the desired columns with the converters argument of read_csv. Let me know if that works for you.

Used in an example:

df = pd.io.parsers.read_csv("my.csv", sep=",", thousands=",",
                            converters={"a": rem_thousand_sep_and_cast_to_float,
                                        "b": rem_thousand_sep_and_cast_to_float})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

3 participants