Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accept PRs to enhance read_fwf to deal with decimal places and overlaping columns #637

Closed
lucasmation opened this issue Mar 15, 2017 · 8 comments
Labels
feature a feature request or enhancement

Comments

@lucasmation
Copy link

as requested, following up on #534 in a new issue

@hadley,
we have been working on PRs to fix the two limitations of read_fwf

Can't deal with decimal places (issue #399). Proposed solution: add a new argument decimal_positions to fwf_positions, modifying the main read_fwf function (pull request #632, implemented in Rcpp, tks to @gutorc92)

Can't deal with overlapping columns (current issue). Proposed solution: parse the import dictionary into a list of dictionaries with no overlapping variables. Then import the data based on each dictionary and cbind the results (pull request #634, implemented in R, tks to @daniellima123). We implemented this as a separate function read_fwf2 (in the same spirit of how read.cvs2 is related to read.cvs), to avoid recursive calls. We could make the changes directly to read_fwf, defining it recursively. What do you think?

Please let us know if these suggestions are acceptable. We have been using them in our forked repos and they seem to work fine.

@lucasmation lucasmation changed the title accept PR to enhance read_fwf do deal with decimal places and overlaping columns accept PRs to enhance read_fwf to deal with decimal places and overlaping columns Mar 16, 2017
@jimhester jimhester added the feature a feature request or enhancement label Dec 7, 2017
@zross
Copy link

zross commented Mar 29, 2018

Just adding my two cents on this old issue. In the original issue (#534) Hadley called this an esoteric use of fwf. This is probably true but it happens in administrative data. The mortality/natality files for the US from the National Center for Health Statistics have overlapping FWF, especially with regard to the FIPS codes for geographic regions. A screenshot of the layout documentation below.

So a +1 to allow overlapping specifications in some of the syntax versions.

image

@billdenney
Copy link
Contributor

I'm having the exact same issue as @zross with the exact same dataset.

I'd propose a relatively simple PR that adds an argument to read_fwf() called allow_overlap which defaults to FALSE (leaving the error as is currently in place-- as described in #534, it's most commonly a data specification error). When allow_overlap=TRUE, it would simply disable that error message.

After this, the error message would be updated to point the user to the allow_overlap argument in case it's intentional.

This is a different suggestion than that in #692 which appears to simply remove the overlap limitation.

Thoughts?

@lucasmation
Copy link
Author

@billdenney , I would be fine with your solution too, either way.

But I don't think @hadley (or Rstudio team) treated this issue fairly. In the original issue, #534 Hadley after saying it was exoteric, said he would consider a PR and closed the issue. We created this issue and submitted the PR, but they did not even bother to look at it. @hadley, I know you are a busy person and admire your work overall, but no answer for these PRs from you or anyone in your team in 1,5 year is disrespectful of our time and not in the spirit of opensource.

@billdenney
Copy link
Contributor

@lucasmation, I understand the frustration, but if I'm reading PR #692 correctly, it mostly undoes PR #585. (I could be missing something there, though.)

I can't speak for @hadley, but I would guess that a PR would be accepted if it does direct goal of allowing overlapping columns in some cases while not undoing prior functionality which covers the common case where overlapping columns would be an accident that should be prevented. #692 undoes functionality that was intentionally included elsewhere; I doubt such a feature is ever likely to be included.

Elsewhere, (in #399) he already stated that he wouldn't accept a PR that added a decimal places argument. Having disagreed with @hadley in other places, I understand when goals don't align, but he did already answer the question in #399 that such a feature wouldn't be accepted-- handling that would be a requirement of other code.

With that long preamble, @gergness, might you be willing to rework your PR #692 so that it adds the allow_overlap=TRUE functionality as described above (also updating the default error message)?

@batpigandme
Copy link
Contributor

@lucasmation, I'm sorry about the delay, and understand your frustration. I can't presume to say when it (the PR) will be addressed, but I'm glad to hear that it's working for you in your fork. 👍 I think @billdenney is on target re. the decimals argument, which, if I'm understanding things correctly, is not strictly necessary for the overlap functionality to be implemented.

@gergness
Copy link
Contributor

I have no strong objections to adapting the PR, but I assume that the reason it hasn't been incorporated has more to do with the issues with CRAN discussed in #856 than about how it wouldn't catch mis-specifications of columns locations.

It also feels worth mentioning that I've been working on a fork of readr that builds more features dedicated to fixed width files called hipread. It grew out of some issues I had while trying to use readr for IPUMS data. In addition to allowing overlapping columns and implicit decimals, it also reads hierarchical files where a single file has rows with different column specifications (a very weird format that seems popular among census agencies because of CSPro).

I cut a lot of readr's features that I didn't need for IPUMS data and I doubt it will ever be as polished as readr, but for people interested in this issue, it may be useful. It's available on CRAN, or you can check it out here: https://github.com/mnpopcenter/hipread

@jadermcs
Copy link

jadermcs commented Sep 21, 2018

The cran/PNADcIBGE#1 has the same issue.
PS: solved

@lock
Copy link

lock bot commented May 15, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators May 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

7 participants