Skip to content

Commit

Permalink
BUG: Use correct line terminator on Windows
Browse files Browse the repository at this point in the history
* Use OS line terminator if none is provided
* Enforce line terminator selection if one is

Originally authored by @deflatSOCO, but reapplied
by @gfyoung due to enormous merge conflicts.

Closes gh-20353.
  • Loading branch information
gfyoung committed Oct 19, 2018
1 parent ecc5cbc commit 23ce5c6
Show file tree
Hide file tree
Showing 9 changed files with 468 additions and 104 deletions.
91 changes: 91 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,97 @@ If installed, we now require:
| scipy | 0.18.1 | |
+-----------------+-----------------+----------+

.. _whatsnew_0240.api_breaking.csv_line_terminator:

`os.linesep` is used for ``line_terminator`` of ``DataFrame.to_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`DataFrame.to_csv` now uses :func:`os.linesep` rather than ``'\n'``
for the default line terminator (:issue:`20353`).
This change only affects when running on Windows, where ``'\r\n'`` was used for line terminator
even when ``'\n'`` was passed in ``line_terminator``.

Previous Behavior on Windows:

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: # When passing file PATH to to_csv, line_terminator does not work, and csv is saved with '\r\n'.
...: # Also, this converts all '\n's in the data to '\r\n'.
...: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\r\nbc","a\r\r\nbc"\r\n'

In [4]: # When passing file OBJECT with newline option to to_csv, line_terminator works.
...: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False, line_terminator='\n')

In [5]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


New Behavior on Windows:

- By passing ``line_terminator`` explicitly, line terminator is set to that character.
- The value of ``line_terminator`` only affects the line terminator of CSV,
so it does not change the value inside the data.

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False, line_terminator='\n')

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\n"a\nbc","a\r\nbc"\n'


- On Windows, the value of ``os.linesep`` is ``'\r\n'``,
so if ``line_terminator`` is not set, ``'\r\n'`` is used for line terminator.
- Again, it does not affect the value inside the data.

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: data.to_csv("test.csv", index=False)

In [3]: with open("test.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'


- For files objects, specifying ``newline`` is not sufficient to set the line terminator.
You must pass in the ``line_terminator`` explicitly, even in this case.

.. code-block:: ipython

In [1]: data = pd.DataFrame({
...: "string_with_lf": ["a\nbc"],
...: "string_with_crlf": ["a\r\nbc"]
...: })

In [2]: with open("test2.csv", mode='w', newline='\n') as f:
...: data.to_csv(f, index=False)

In [3]: with open("test2.csv", mode='rb') as f:
...: print(f.read())
b'string_with_lf,string_with_crlf\r\n"a\nbc","a\r\nbc"\r\n'

.. _whatsnew_0240.api_breaking.interval_values:

``IntervalIndex.values`` is now an ``IntervalArray``
Expand Down
9 changes: 6 additions & 3 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -9518,7 +9518,7 @@ def last_valid_index(self):
def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
columns=None, header=True, index=True, index_label=None,
mode='w', encoding=None, compression='infer', quoting=None,
quotechar='"', line_terminator='\n', chunksize=None,
quotechar='"', line_terminator=None, chunksize=None,
tupleize_cols=None, date_format=None, doublequote=True,
escapechar=None, decimal='.'):
r"""
Expand Down Expand Up @@ -9583,9 +9583,12 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
will treat them as non-numeric.
quotechar : str, default '\"'
String of length 1. Character used to quote fields.
line_terminator : string, default ``'\n'``
line_terminator : string, optional
The newline character or character sequence to use in the output
file.
file. Defaults to `os.linesep`, which depends on the OS in which
this method is called ('\n' for linux, '\r\n' for Windows, i.e.).
.. versionchanged:: 0.24.0
chunksize : int or None
Rows to write at a time.
tupleize_cols : bool, default False
Expand Down
5 changes: 3 additions & 2 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,13 +417,14 @@ def _get_handle(path_or_buf, mode, encoding=None, compression=None,
elif is_path:
if compat.PY2:
# Python 2
mode = "wb" if mode == "w" else mode
f = open(path_or_buf, mode)
elif encoding:
# Python 3 and encoding
f = open(path_or_buf, mode, encoding=encoding)
f = open(path_or_buf, mode, encoding=encoding, newline="")
elif is_text:
# Python 3 and no explicit encoding
f = open(path_or_buf, mode, errors='replace')
f = open(path_or_buf, mode, errors='replace', newline="")
else:
# Python 3 and binary mode
f = open(path_or_buf, mode)
Expand Down
3 changes: 2 additions & 1 deletion pandas/io/formats/csvs.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from zipfile import ZipFile

import numpy as np
import os

from pandas._libs import writers as libwriters

Expand Down Expand Up @@ -73,7 +74,7 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='',
self.doublequote = doublequote
self.escapechar = escapechar

self.line_terminator = line_terminator
self.line_terminator = line_terminator or os.linesep

self.date_format = date_format

Expand Down
Loading

0 comments on commit 23ce5c6

Please sign in to comment.