Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow either DataValues or Missings for representing missing data #34

Open
dmbates opened this issue Jan 13, 2018 · 6 comments
Open

Allow either DataValues or Missings for representing missing data #34

dmbates opened this issue Jan 13, 2018 · 6 comments

Comments

@dmbates
Copy link
Contributor

dmbates commented Jan 13, 2018

@davidanthoff I wanted to run this by you before preparing a PR.

I think it should be possible to represent missing data values in the result using either the DataValues package or the Missings package. Suppose that all the read_* functions had an optional Bool argument to control this choice with its value being stored in the ReadStatDataFrame. The handle_variable! function then ends with

    push!(ds.data, ds.useMissings ? fill(Union{jtype, Missing}(missing), ds.rows) : DataValueVector{ds.rows))

I think the only additional changes would be to use

dest::Union{DataValueVector{T}, Vector{Union{Missing,T}}}

in the signature of the readfield! methods and to add the Missings package.

Shall I prepare a PR for your consideration?

@davidanthoff
Copy link
Member

What is the use-case for this? https://github.com/davidanthoff/StatFiles.jl should support getting a DataFrame that uses Missing, and my idea here was that ReadStat.jl really is just a low-level library that most folks don't really interact with.

@dmbates
Copy link
Contributor Author

dmbates commented Jan 18, 2018

I believe that as of version 0.7 of Julia, the Missing type, the missing value and the ismissing function will be part of Base.

@davidanthoff
Copy link
Member

Yes, that is so. Should we revisit once 0.7 is out (or at least a beta or something like that)? There are some really complicated integration questions on how all of this interacts with Query.jl that just will require more time.

But I do hope that in the meantime the approach with StatFiles.jl should work for folks that want a DataFrame that has Vector{Union{T,Missing}} columns, that should all just work already.

@dmbates
Copy link
Contributor Author

dmbates commented Jan 18, 2018

I agree that waiting until 0.7 is at least in beta is a good idea.

@nalimilan
Copy link
Contributor

I don't think 0.7 will change a lot of things (except performance) here. You could as well use Missings.jl for now, it should provide the same features on 0.6.

@dmbates
Copy link
Contributor Author

dmbates commented Jan 20, 2018

I have a version using Missing in the julia7 branch in https://github.com/dmbates/ReadStat.jl

I haven't created a pull request yet. I wanted to check first with @davidanthoff to see if he would be open to this approach. This version works on Julia 0.6.2 and Julia 0.7.0-dev.

Tests don't work because this returns Vector{Union{Missing,T}} in the data field. I can add/change tests if this approach is acceptable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants