Skip to content
This repository has been archived by the owner on May 5, 2019. It is now read-only.

Start replacing Nullable with Null #62

Open
nalimilan opened this issue May 13, 2017 · 3 comments
Open

Start replacing Nullable with Null #62

nalimilan opened this issue May 13, 2017 · 3 comments

Comments

@nalimilan
Copy link
Member

@quinnj has just created a Null.jl package providing a new Null type to replace DataArrays' NAtype. Even if the Julia compiler doesn't yet include the necessary optimizations to handle Union{T, Null} efficiently (see e.g. discussion at JuliaData/Missings.jl#3), I think we should start moving away from Nullable now, so that at least we can stabilize the API even if performance remains poor for some time.

NullableArray can be replaced with Array{Union{T, Null}}, which Jameson said will eventually use the same memory layout as NullableArray. This should suit quite well with @cjprybol's PR #53 which is going to stop auto-promoting columns to NullableArray. CategoricalArray and NullableCategoricalArray will have to be adapted, but that shouldn't be too hard.

@ararslan
Copy link
Member

Why not just do this in DataFrames/DataArrays, since the approach there is already the closest to how Nulls.jl works?

@nalimilan
Copy link
Member Author

Because storing columns as Array{Union{T, Null}} is going to be quite slow until (at least) Julia 1.0, and because AFAIK we don't want to continue using Nullable in the future. So better keep DataFrames usable for now (maybe porting DataArrays to Null, but keeping them for efficient memory layout) and apply breaking changes to DataTables, which are still in an experimental state. After Julia 1.0 we should be able to make DataFrames and DataTables converge to a common representation.

@davidanthoff
Copy link
Contributor

I would much prefer if we could keep DataTables.jl as the place for a container based approach to missing values. At this point it is not clear whether the Union{T,Null} approach can work for the whole data ecosystem (e.g. Query.jl) and I don't think we should start to convert anything in this repo here until that is sorted out.

Why not do the Nulls.jl work in a branch in DataFrames.jl?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants