-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
skip bug in read_lines and read_fwf if unpaired quotes in the data #991
Comments
This might be the cause of issue #986 |
ouch, that makes sense. we do agree that for consistency read_lines() should not care about any characters (quotes or otherwise) present in the input stream, except CR-LF, correct? note: those "quote" field separators are relevant for field-oriented readers (read_csv, read_delim), via the "quote" parameter, but shouldn't be for read_lines() or read_file(). the authors probably inadvertently reused some code, hopefully they will be kind enough to review this in time, as read_lines() is a major workhorse for most people trying to investigate structural problems with files. |
Yes. And I think the authors wll do as well, because this just happens with skip, and when reading all lines there is no problem and these "enquoted" newlines are considered as line separators |
i guess one workaround is to simply substitute all quotes for some other unusual char in a file prior to pushing it into
however this defeats the main use of
cheers |
Cumprimentos |
Hi there. Came here to file a new issue as this has bit me quite badly when working on a huge data import project using either
This causes all sorts of issues in user land as in many cases skip will silently skip an inconsistent number of data lines because of this bug. Any pointers on how to fix this? I am back to using base now but I would rather use {vroom} for speed... |
@jimhester: |
We only want to try to find embedded newlines when skipping lines if our format uses quoting. Otherwise we don't want to check for quoted newlines when skipping Fixes tidyverse/readr#991 (comment)
Functions
read_lines()
andread_fwf()
don't behave correctly wrt skip parameter if there are unpaired double quotes (") in the data.Created on 2019-04-15 by the reprex package (v0.2.1)
Created on 2019-04-15 by the reprex package (v0.2.1)
The text was updated successfully, but these errors were encountered: