Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: line terminator and '\n to \r\n' problem in Windows(Issue #20353) #21406

Merged
merged 1 commit into from
Oct 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,97 @@ If installed, we now require:
| scipy | 0.18.1 | |
+-----------------+-----------------+----------+

.. _whatsnew_0240.api_breaking.csv_line_terminator:

`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
for the default line terminator (:issue:`20353`).
This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
even when ``'\n'`` was passed in ``line_terminator``.
deflatSOCO marked this conversation as resolved.
Show resolved Hide resolved

Previous Behavior on Windows:

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

deflatSOCO marked this conversation as resolved.
Show resolved Hide resolved
In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
...: # Also, this converts all '\n's in the data to '\r\n'.
...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'
deflatSOCO marked this conversation as resolved.
Show resolved Hide resolved

In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
...: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


New Behavior on Windows:
gfyoung marked this conversation as resolved.
Show resolved Hide resolved

- By passing ``line_terminator`` explicitly, line terminator is set to that character.
- The value of ``line_terminator`` only affects the line terminator of CSV,
so it does not change the value inside the data.
gfyoung marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


- On Windows, the value of ``os.linesep`` is ``'\r\n'``,
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
- Again, it does not affect the value inside the data.

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'


- For files objects, specifying ``newline`` is not sufficient to set the line terminator.
You must pass in the ``line_terminator`` explicitly, even in this case.

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

.. _whatsnew_0240.api_breaking.interval_values:

``IntervalIndex.values`` is now an ``IntervalArray``
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -9518,7 +9518,7 @@ def last_valid_index(self):
def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
columns=None, header=True, index=True, index_label=None,
mode='w', encoding=None, compression='infer', quoting=None,
quotechar='"', line_terminator='\n', chunksize=None,
quotechar='"', line_terminator=None, chunksize=None,
tupleize_cols=None, date_format=None, doublequote=True,
escapechar=None, decimal='.'):
r"""
Expand Down Expand Up @@ -9583,9 +9583,12 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : string, default ``'\n'``
line_terminator : string, optional
The newline character or character sequence to use in the output
file.
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\n' for linux, '\r\n' for Windows, i.e.).

.. versionchanged:: 0.24.0
gfyoung marked this conversation as resolved.
Show resolved Hide resolved
chunksize : int or None
Rows to write at a time.
tupleize_cols : bool, default False
Expand Down
5 changes: 3 additions & 2 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,13 +417,14 @@ def _get_handle(path_or_buf, mode, encoding=None, compression=None,
elif is_path:
if compat.PY2:
# Python 2
mode = "wb" if mode == "w" else mode
f = open(path_or_buf, mode)
elif encoding:
# Python 3 and encoding
f = open(path_or_buf, mode, encoding=encoding)
f = open(path_or_buf, mode, encoding=encoding, newline="")
elif is_text:
# Python 3 and no explicit encoding
f = open(path_or_buf, mode, errors='replace')
f = open(path_or_buf, mode, errors='replace', newline="")
else:
# Python 3 and binary mode
f = open(path_or_buf, mode)
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/formats/csvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from zipfile import ZipFile

import numpy as np
import os

from pandas._libs import writers as libwriters

Expand Down Expand Up @@ -73,7 +74,7 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='',
self.doublequote = doublequote
self.escapechar = escapechar

self.line_terminator = line_terminator
self.line_terminator = line_terminator or os.linesep

self.date_format = date_format

Expand Down
Loading