fix(python,rust): read_csv preserve whitespace and newlines #13934
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
fixes #13933 and partly fix for #12763
addresses a few issues of
read_csv
(adding issue links later)Fix 1: preserve whitespace at start of line (currently trimmed)
Currently
read_csv
trims whitespace at the start of each line (only for the first value). This is a bug. Whitespace should be preserved as is belongs to the value. Also this only happens for the first value.Fix 2: preserve newlines (see #13933 )
Currently
read_csv
removed empty lines if the csv has multiple columns and keeps empty lines asnull
for csv with a single column. All whitespace in csv files belongs to the values and should be preserved. Also this is currently inconsistent as described. In the future I can also add an option like pandas has to optionally skip empty lines.if/when this is merged I can also add an parameter
skip_empty_lines
(like pandas). But default should be to preserve any form of whitespace as it belongs to the format.