-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a decimal parameter to read_csv / scan_csv #6698
Comments
As a Latam User I come across the same necessity. I've recently made an PR to be able to give this type of parsing instructions to pyarrow through our Polars API, the only limitation is the pyarrow is not available for use on the lazy api with scan_csv. |
I'm curious what the CSV files look like - if they are using a comma inside the number, presumably they must use a different (non-comma) separator? (TAB, perhaps?) Or are all the numeric values typically double-quoted instead? For example...
...or:
|
... I can answer for France, where the semicolon is used as separator. |
The file standard separator for us is semi-colon, so the txt would be like:
Some systems sometime even use pipe as separator, but it is not something that is a problem.
The important detail is that the numbers are not involved by quotes and the comma is the decimal separator, therefore: We just swap the dot for the comma. edit_1 A wiki table that shows the common radix point patterns across the globe. |
@Bebio95 The pyarrow parsing options: |
Ok, looks much as I expected, thanks; I think I can add this facility into the polars-native (Rust) parser at very little cost, but probably not until the weekend 👍 |
Hello, |
Is there any progress on this? It would be great to be able to specify |
This PR only aid the write function, we need this parameters on the read and scan functions too... |
I would appreciate this feature being added too! |
I would love to see this too! |
It seems that the |
@alexander-beedie just going thru my list of CSV issues - did you end up getting this to work? |
I was planning to as our previous float parser did support this; the newer SIMD parser unfortunately does not, so this is currently stuck until such support can be added 😓 |
Hi @alexander-beedie , just wondering what the "newer" SIMD parser is. Is it another crate or does Pola.rs have its own CSV parser implementation? |
We have our own CSV parser, but this is referring to the SIMD string→float parsing library which that CSV parser calls when handling float-like strings. |
I think it would also be really nice to see a thousands parameter just like |
Problem description
As a french user of polars, it would be very convenient to have a decimal parameter (as in Pandas) to specify it (',' for France but I think it's also the case in Germany) and obtain directly the desired dataframe, without being forced to use a str.replace on every import.
The text was updated successfully, but these errors were encountered: