Skip to content

Commit

Permalink
DOC, TST: Clarify whitespace behavior in read_fwf documentation (#16950)
Browse files Browse the repository at this point in the history
Closes gh-16772
  • Loading branch information
Lucas Kushner authored and gfyoung committed Jul 18, 2017
1 parent 7b9a57f commit fcb0263
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 7 deletions.
6 changes: 5 additions & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1258,7 +1258,8 @@ Files with Fixed Width Columns

While ``read_csv`` reads delimited data, the :func:`read_fwf` function works
with data files that have known and fixed column widths. The function parameters
to ``read_fwf`` are largely the same as `read_csv` with two extra parameters:
to ``read_fwf`` are largely the same as `read_csv` with two extra parameters, and
a different usage of the ``delimiter`` parameter:

- ``colspecs``: A list of pairs (tuples) giving the extents of the
fixed-width fields of each line as half-open intervals (i.e., [from, to[ ).
Expand All @@ -1267,6 +1268,9 @@ to ``read_fwf`` are largely the same as `read_csv` with two extra parameters:
behaviour, if not specified, is to infer.
- ``widths``: A list of field widths which can be used instead of 'colspecs'
if the intervals are contiguous.
- ``delimiter``: Characters to consider as filler characters in the fixed-width file.
Can be used to specify the filler character of the fields
if it is not spaces (e.g., '~').

.. ipython:: python
:suppress:
Expand Down
13 changes: 7 additions & 6 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,6 @@
file. For file URLs, a host is expected. For instance, a local file could
be file ://localhost/path/to/table.csv
%s
delimiter : str, default ``None``
Alternative argument name for sep.
delim_whitespace : boolean, default False
Specifies whether or not whitespace (e.g. ``' '`` or ``'\t'``) will be
used as the sep. Equivalent to setting ``sep='\s+'``. If this option
Expand Down Expand Up @@ -316,7 +314,9 @@
be used automatically. In addition, separators longer than 1 character and
different from ``'\s+'`` will be interpreted as regular expressions and
will also force the use of the Python parsing engine. Note that regex
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``"""
delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``
delimiter : str, default ``None``
Alternative argument name for sep."""

_read_csv_doc = """
Read CSV (comma-separated) file into DataFrame
Expand All @@ -341,15 +341,16 @@
widths : list of ints. optional
A list of field widths which can be used instead of 'colspecs' if
the intervals are contiguous.
delimiter : str, default ``'\t' + ' '``
Characters to consider as filler characters in the fixed-width file.
Can be used to specify the filler character of the fields
if it is not spaces (e.g., '~').
"""

_read_fwf_doc = """
Read a table of fixed-width formatted lines into DataFrame
%s
Also, 'delimiter' is used to specify the filler character of the
fields if it is not spaces (e.g., '~').
""" % (_parser_params % (_fwf_widths, ''))


Expand Down
29 changes: 29 additions & 0 deletions pandas/tests/io/parser/test_read_fwf.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,3 +405,32 @@ def test_skiprows_inference_empty(self):

with pytest.raises(EmptyDataError):
read_fwf(StringIO(test), skiprows=3)

def test_whitespace_preservation(self):
# Addresses Issue #16772
data_expected = """
a ,bbb
cc,dd """
expected = read_csv(StringIO(data_expected), header=None)

test_data = """
a bbb
ccdd """
result = read_fwf(StringIO(test_data), widths=[3, 3],
header=None, skiprows=[0], delimiter="\n\t")

tm.assert_frame_equal(result, expected)

def test_default_delimiter(self):
data_expected = """
a,bbb
cc,dd"""
expected = read_csv(StringIO(data_expected), header=None)

test_data = """
a \tbbb
cc\tdd """
result = read_fwf(StringIO(test_data), widths=[3, 3],
header=None, skiprows=[0])

tm.assert_frame_equal(result, expected)

0 comments on commit fcb0263

Please sign in to comment.