Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

address vcat return inconsistency #187

Closed
wants to merge 13 commits into from
Closed

address vcat return inconsistency #187

wants to merge 13 commits into from

Conversation

cjprybol
Copy link
Contributor

@cjprybol cjprybol commented Feb 24, 2017

@cjprybol cjprybol changed the title address https://github.com/JuliaStats/NullableArrays.jl/issues/167 address vcat return inconsistency Feb 24, 2017
@codecov-io
Copy link

codecov-io commented Feb 24, 2017

Codecov Report

Merging #187 into master will increase coverage by 0.05%.
The diff coverage is 75%.

@@            Coverage Diff             @@
##           master     #187      +/-   ##
==========================================
+ Coverage   64.56%   64.62%   +0.05%     
==========================================
  Files          13       13              
  Lines         697      701       +4     
==========================================
+ Hits          450      453       +3     
- Misses        247      248       +1
Impacted Files Coverage Δ
src/nullablevector.jl 86.44% <75%> (-0.41%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b584c81...2e2b98a. Read the comment docs.

@nalimilan
Copy link
Member

Thanks, but I'd really like to avoid having to copy so much code from Base. This is JuliaLang/julia#2326.

In the short term, I'm not sure what's the best approach. Isn't there a way to use the machinery from Base without reimplementing the full method?

@@ -311,3 +311,80 @@ function Base.empty!(X::NullableVector)
empty!(X.isnull)
return X
end

function Base.promote_rule{T1,T2}(::Type{T1}, ::Type{Nullable{T2}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not actually needed (and would have to live in Base anyway). Indeed you should never call promote_rule directly, but use promote_type instead. This already works:

julia> promote_type(Nullable{Int64}, Int64)
Nullable{Int64}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I see now that promote_type has the consistency I was expecting from promote_rule. I'll need to learn more about those differences.

julia> promote_rule(Nullable{Int64}, Int64)
Nullable{Int64}

julia> promote_rule(Int64, Nullable{Int64})
Union{}

julia> promote_type(Nullable{Int64}, Int64)
Nullable{Int64}

julia> promote_type(Int64, Nullable{Int64})
Nullable{Int64}

@nalimilan
Copy link
Member

I guess one partial solution is to define methods like vcat(::AbstractArray, ::NullableArray, ::AbstractArray...) which call Base.type_hcat with the correct return type. Cf. JuliaStats/DataArrays.jl#213.

promote_rule(Nullable{T2}, T1)
end

function Base.typed_hcat{T}(::Type{T}, A::AbstractVecOrMat...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not OK to override Base methods like this with signatures that only involve Base types. That will affect unrelated packages and is known as type piracy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes perfect sense. Thanks for explaining what type piracy is

@cjprybol cjprybol changed the title address vcat return inconsistency [WIP] address vcat return inconsistency Feb 25, 2017
@cjprybol
Copy link
Contributor Author

I need to figure out how to handle the 3+ argument case but this is working nicely for the 2 argument cases

julia> @test typeof(vcat(1:2, NullableArray(3:4))) == NullableArrays.NullableArray{Int64,1}
Test Passed
  Expression: typeof(vcat(1:2,NullableArray(3:4))) == NullableArrays.NullableArray{Int64,1}
   Evaluated: NullableArrays.NullableArray{Int64,1} == NullableArrays.NullableArray{Int64,1}

julia> @test typeof(vcat([1 2], NullableArray([3 4]))) == NullableArrays.NullableArray{Int64,2}
Test Passed
  Expression: typeof(vcat([1 2],NullableArray([3 4]))) == NullableArrays.NullableArray{Int64,2}
   Evaluated: NullableArrays.NullableArray{Int64,2} == NullableArrays.NullableArray{Int64,2}

julia> @test typeof(hcat(1:2, NullableArray(3:4))) == NullableArrays.NullableArray{Int64,2}
Test Passed
  Expression: typeof(hcat(1:2,NullableArray(3:4))) == NullableArrays.NullableArray{Int64,2}
   Evaluated: NullableArrays.NullableArray{Int64,2} == NullableArrays.NullableArray{Int64,2}

julia> @test typeof(hcat([1 2], NullableArray([3 4]))) == NullableArrays.NullableArray{Int64,2}
Test Passed
  Expression: typeof(hcat([1 2],NullableArray([3 4]))) == NullableArrays.NullableArray{Int64,2}
   Evaluated: NullableArrays.NullableArray{Int64,2} == NullableArrays.NullableArray{Int64,2}

@@ -311,3 +311,14 @@ function Base.empty!(X::NullableVector)
empty!(X.isnull)
return X
end

notnullarray = Union{filter!(x -> !isa(x, Type{NullableArray}), subtypes(AbstractArray))...}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need this. Just AbstractArray should be enough, and the most specific method will take precedence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When usingAbstractArray there's no way to exit the vcat -> vcat loop and it results in StackOverflow. We either eternally prepend empty NullableArrays (vcat(NullableVector{eltype(A)}(), A, B)) or eternally try to convert A to a NullableArray (vcat(NullableArray(A), B))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realize that if I call typed_hcat, typed_vcat instead of calling hcat/vcat I can probably avoid the StackOverflow while using the simpler AbstractArray as you suggested.

notnullmatrix = typeintersect(notnullarray, AbstractMatrix)
notnullvector = typeintersect(notnullarray, AbstractVector)

Base.vcat{T <: notnullarray}(A::T, B::NullableArray) = Base.vcat(NullableArray(A), B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wasteful since it creates a copy. Maybe we can use this trick to force creating a NullableArray: vcat(NullableArray{eltype(A)}(), A, B).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, no need for Base..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case of adding an empty NullableArray to the beginning of the vcat/hcat call only works on vcat with Vectors, but I've added it for that case. Every other case I tried will fail in dimension mismatch checks

src/operators.jl Outdated
@@ -1,7 +1,7 @@
## Lifted operators

importall Base.Operators
import Base: promote_op, abs, abs2, sqrt, cbrt, scalarmin, scalarmax, isless
import Base: abs, abs2, cbrt, isless, promote_op, scalarmin, scalarmax, sqrt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

@@ -223,4 +223,21 @@ module TestNullableVector
X = NullableArray(A, M)
empty!(X)
@test isempty(X)

@test typeof(hcat(NullableArray(1:2), 3:4)) == NullableArrays.NullableArray{Int64,2}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for NullableArrays..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -1,3 +1,5 @@
import Base: hcat, vcat
Copy link
Contributor Author

@cjprybol cjprybol Feb 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses Base. to prefix the function extensions. Should I add all of the functions to this import call? It's an unrelated change but importing these two and leaving everything else to be Base. is inconsistent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, better not import anything and use the Base. prefix when overloading methods.

@cjprybol cjprybol changed the title [WIP] address vcat return inconsistency address vcat return inconsistency Feb 25, 2017
@cjprybol
Copy link
Contributor Author

Should these promote to NullableCategoricalArray?

julia> vcat(categorical([1,2,3]), NullableArray([4,5,6]))
6-element NullableArrays.NullableArray{Int64,1}:
 1
 2
 3
 4
 5
 6

julia> vcat(NullableArray([4,5,6]), categorical([1,2,3]))
6-element NullableArrays.NullableArray{Int64,1}:
 4
 5
 6
 1
 2
 3

@@ -311,3 +313,27 @@ function Base.empty!(X::NullableVector)
empty!(X.isnull)
return X
end

function vcat(A::AbstractVector, B::AbstractVector...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still type piracy. There needs to be NullableArray in the signature. Currently I don't think this can be fixed in full generality. We can include a few methods for all possible combinations up to e.g. 3 arguments, i.e:

  • ::NullableArray, ::AbstractArray
  • ::AbstractArray, ::NullableArray
  • ::NullableArray, ::AbstractArray, ::AbstractArray
  • ::AbstractArray, ::NullableArray, ::AbstractArray
  • ::AbstractArray, ::AbstractArray, ::NullableArray

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining these functions would be a regression compared to just leaving the current functionality as is.

# this rule seems more intuitive
isa(array[1], NullableArray) ? NullableArray{T} : Array[Nullable{T}]

# than this rule
isa(array[1], NullableArray) ||
isa(array[2], NullableArray) ||
isa(array[3], NullableArray) ? NullableArray{T} : Array[Nullable{T}]

If the general functionality is type piracy then I feel the best way forward would be to either address JuliaLang/julia#2326 directly or accept the current behavior. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know. Clearly we can't reach a satisfying state without improvements in Julia. But which of the partial solutions is better, I don't know...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar sentiments here. I'll implement the 2 argument cases for hcat/vcat and now at least we have everything written up and documented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you implement the two-argument versions, please also implement the three-argument versions. Else the rule is going to be even more complex than what you wrote above.

Of course, this rests on the assumption that concatenating more than 3 arrays is a somewhat rare operation, so that it's not a big issue if other variants behave differently. Might not be the case, but only special-casing two-argument methods is clearly a worse solution.

@nalimilan
Copy link
Member

Should these promote to NullableCategoricalArray?

Yes, but that's tricky since doing so forces one package to depend on the other one, and as long as StatsModels depends on CategoricalArrays we'd better get rid of the NullableArrays dependency (JuliaData/CategoricalArrays.jl#55). So for now I would leave this case as is.

typed_vcat(promote_eltype(A, B...), NullableArray(A), B...) :
typed_vcat(promote_eltype(A, B...), A, B...)
function vcat(A::AbstractVector, B::NullableVector)
typed_vcat(promote_eltype(A, B), NullableArray(A), B)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce the number of spaces used for indentation here by 50%

@cjprybol
Copy link
Contributor Author

cjprybol commented Mar 4, 2017

I think we should close this. I read through a few dozen pages of search results for vcat usage in Julia code here on GitHub and saw a usage pattern ~ vcat(a,b) >> vcat(a...) ~= vcat(a,b,c) >> vcat(a,b,c,d). If we could redirect this into a brainstorm about how to address JuliaLang/julia#2326, I think that would be a much better use of our time and I'd be happy to try and implement suggestions. I tried variants of JuliaLang/julia#2326 (comment) and couldn't get any of them to work

@cjprybol cjprybol closed this Mar 4, 2017
@nalimilan
Copy link
Member

I've tried starting a discussion about the way to move forward at JuliaLang/julia#20815, let's see what can be done about this.

@ararslan
Copy link
Member

ararslan commented Mar 4, 2017

In the meantime, thanks so much for your hard work here, Cameron. Much appreciated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants