Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand for AbstractArray conversions #60

Closed
bkamins opened this issue Nov 30, 2017 · 6 comments
Closed

Shorthand for AbstractArray conversions #60

bkamins opened this issue Nov 30, 2017 · 6 comments

Comments

@bkamins
Copy link
Member

bkamins commented Nov 30, 2017

It would be nice to be have the following two functions:

  1. takes AbstractArray{T} and returns AbstractArray{Union{T, Missing}};
  2. takes AbstractArray{Union{T, Missing}} and returns AbstractArray{T} or throws an error if it contained any `missing.

It now can be achieved e.g. using convert but I could not think of any more compact and handy conversion method (if there is such it would be great to document it).

If it is considered to be worth to add such methods there are two corner cases that is also currently present in Missings.T function:

julia> x = Union{Missing, Int, Float64}[1, 1.0, missing]
3-element Array{Union{Float64, Int64, Missings.Missing},1}:
 1
 1.0
  missing

julia> Missings.T(x)
ERROR: MethodError: no method matching T(::Array{Union{Float64, Int64, Missings.Missing},1})
Closest candidates are:
  T(::Type{Missings.Missing}) at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:41
  T(::Type{Any}) at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:43
  T(::Type{Union{Missings.Missing, T1}}) where T1 at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:40
  ...

This should probably return Union{Int, Float64}.

The other is:

julia> Missings.T([1,2,3])
ERROR: MethodError: no method matching T(::Array{Int64,1})
Closest candidates are:
  T(::Type{Missings.Missing}) at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:41
  T(::Type{Any}) at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:43
  T(::Type{Union{Missings.Missing, T1}}) where T1 at D:\Software\JULIA_PKG\v0.6\Missings\src\Missings.jl:40
  ...

which should probably be Int and not error.

@ararslan
Copy link
Member

ararslan commented Dec 1, 2017

Your 1. there is allowmissing!, though that might be in DataFrames rather than here, I can't recall. 2. is dealt with using an iterator that fails on missing during iteration.

Missings.T is intended to act on types rather than values, so I don't think we should add other methods to it. It's possible to make it work for larger unions with something like this:

Missings.T(U::Type{Union{S,Missing}}) where {S} =
    Union{filter!(t -> !(t <: Missing), Base.uniontypes(U))...}

but I don't know how practically useful that would be.

@bkamins
Copy link
Member Author

bkamins commented Dec 1, 2017

Thanks for a prompt answer.

Clear with Missings.T - after thinking I agree to have having it only for types (and it already works for unions, i.e. Missings.T(Union{Int, Float64, Missing}) ==Union{Float64, Int64}).

Regarding the main issue of conversions:

  1. allowmissing! is in DataFrames - I would find it natural to also have something similar in Missing - e.g. we have missings function here although fill could be simply used instead. An example implementation could be (I do not know if it is optimal - maybe some tweaking to ensure better type inference is needed - but gives the idea):
function allowmissing(x::AbstractArray)
    y = similar(x, Union{Missing, eltype(x))
    for i in eachindex(x)
        y[i] = x[i]
    end
end
  1. regarding conversion from missing - you are right and actually type broadcast is probably simplest, e.g. Int.(x) if x is an array containing missings. So nothing new is needed here.

@nalimilan
Copy link
Member

The problem with allowmissing is that it should ideally just call convert, but that doesn't work for all arrays. For example, BitArray cannot allow for missing values. Using similar is a poor fallback since it will always allocate a copy, while in most cases the compiler should (at some point) be able to reuse the same memory (and just allocate the missingness mask).

That's why allowmissing in DataFrames is more limited and converts to Array{Union{T, Missing}}. We could always use that solution as a fallback, and leave custom arrays override it as needed. We could also be more ambitious and add a toeltype() function like this:

toeltype(::Type{T}, x::AbstractArray) where {T} = convert(Array{T}, x)

Then we could define:

allowmissing(x::AbstractArray) = toeltype(Union{eltype(x), Missing}, x)

We could possibly be even more ambitious and introduce a function like similar, but taking an array type (rather than an instance) and returning a type. similar could use it as a fallback, and it could be part of the AbstractArray interface. I've felt the need for this kind of function elsewhere.

Overall it seems this will need to be in Base to be fully generic, which makes sense since Missing is going be to included in Base.

@nalimilan
Copy link
Member

Actually it looks like the needed functionality is already there. We could just define this function:

allowmissing(x::AbstractArray) = convert(AbstractArray{Union{eltype(x), Missing}}, x)

We should start using this immediately in DataFrames too, since it fixes the problem with CategoricalArray (JuliaData/DataFrames.jl#1294).

(The current issue is simpler than similar since we don't need to handle a possibly different dimensionality than the original array, which is complex e.g. for SparseVector vs. SparseMatrixCSC.)

@bkamins
Copy link
Member Author

bkamins commented Dec 1, 2017

@nalimilan - excellent solution. I was unaware that this would work.
Are you preparing a PR with the appropriate changes in the whole DataFrames ecosystem?

@nalimilan
Copy link
Member

@cjprybol Beat me to the DataFrames part. See #64 for the Missings part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants