Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make 'read_table()' behave like read.table() #717

Closed
kevinushey opened this issue Sep 25, 2017 · 2 comments
Closed

make 'read_table()' behave like read.table() #717

kevinushey opened this issue Sep 25, 2017 · 2 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@kevinushey
Copy link

Right now, read_table() is (more or less) read_fwf() and read_table2() is read_delimited(); however, the expectation of most R users is that read_table() would behave like R's own read.table(), and expect a whitespace-delimited file.

IMHO read_table() should just read whitespace-delimited files, and read_table2() shouldn't exist. Users who actually want to read fixed-width files should use read_fwf().

This likely wouldn't break most usages since files readable by read_table() should also be accepted by read_table2().

@jimhester jimhester added the feature a feature request or enhancement label Dec 7, 2017
@jimhester
Copy link
Collaborator

I agree this is confusing, however thing change may break existing behavior silently by allowing inputs which previously failed to be read. Maybe we will decide this is a worthwhile trade off for simplicity sake in the future.

@alistaire47
Copy link

I'd suggest this change is worth consideration. In personal use, I find read_table is far too strict for what I have to throw at it—for the worse stuff, read_table2 is still too strict. In many cases read_fwf is actually easier, when usable. To put some data behind my experience, searching GitHub returns

for a total of 774 files with read_table and 34,543 with read_csv, for a ratio of read_table/read_csv of 0.022406

For R as a whole,

  • 56,517 R and 5,970 Rmd files for "read.table" (which due to inability to limit the search includes read_table as well)
  • 373,612 R and 101,226 Rmd files for "read.csv" (with the same caveat)

for a total of 62,487 files with read.table and 474,838 with read.csv, for a ratio of read.table/read.csv of 0.131596.

Those ratios are non-negligibly different, with means that

  • readr users are more likely to have data in CSV form,
  • when it comes to whitespace-delimited files, they're using something else, or
  • nothing at all, because there's too much error in the data (the numbers do bounce around).

On an absolute scale, the read_table numbers are still relatively small, so while changes may break some code (though likely most would continue to work identically), for now it's not so much that everybody would freak out. Probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants