Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict wrapped types to reduce invalidations #310

Merged
merged 3 commits into from
Nov 7, 2020
Merged

Conversation

nalimilan
Copy link
Member

Methods like convert(::Type{Any}, ::CategoricalValue) triggers lots of invalidations. Restricting the wrapped types to AbstractString, AbstractChar and Number alleviates this problem without affecting usability too much.

One limit with this approach is that e.g. convert(Union{String, T}, CategoricalValue{String}) won't work anymore for any T, even though convert(String, CategoricalValue{String}) will. Though thanks to special casing, it will work for T <: Missing and T <: Nothing.

This also means that CategoricalArray{Any} is no longer supported. Adapt promotion rules to ensure that mixing e.g. strings and integers gives Union{String, Integer}. Drop support for nothing.

Also remove the custom repr method which was a legacy of the CategoricalString era.

Methods like `convert(::Type{Any}, ::CategoricalValue)` triggers lots of invalidations.
Restricting the wrapped types to `AbstractString`, `AbstractChar` and `Number` alleviates
this problem without affecting usability too much.

One limit with this approach is that e.g. `convert(Union{String, T}, CategoricalValue{String})`
won't work anymore for any `T`, even though `convert(String, CategoricalValue{String})` will.
Though thanks to special casing, it will work for `T <: Missing` and `T <: Nothing`.

This also means that `CategoricalArray{Any}` is no longer supported. Adapt promotion rules
to ensure that mixing e.g. strings and integers gives `Union{String, Integer}`. Drop support
for `nothing`.

Also remove the custom `repr` method which was a legacy of the `CategoricalString` era.
Comment on lines -153 to -157
# get! with CategoricalValue{Any} (#220)
p1 = CategoricalPool(Any['a', 'b', 'c'])
p2 = CategoricalPool(Any['a', 'b', 'x'])
@test get!(p1, p2[1]) === UInt32(1)
@test get!(p1, p2[3]) === UInt32(4)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tlienart @ablaom CategoricalArray{Any} appeared to be used by MLJ at least in some corner cases (#220). What do you think of this PR?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that we really need CategoricalArray{Any} but users will invariably try to create categorical arrays out of Any vectors. (I'm pretty sure that's why we went to the trouble of working out our corner cases.)

So what will happen if I call categorical(Any[1, 2, 3]) in this PR? If the fallback was to first look for a type tightening ( v -> [v...]) then maybe we could live with this. But if it just throws an error, this would be pretty annoying, I imagine.

I also vote for including Symbol. A lot of MLJ tests already use Symbol and I presume others have committed the same sin. We could live without it I guess, but excluding it as a workaround for compilation latency issues sounds a little perverse to me. Why are Symbols a problem for the invalidations (which I don't know anything about) but Strings are not? Perhaps this is something that can be addressed on the compiler end, as it likely turns up elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the fallback was to first look for a type tightening

I think fallback should check the contents not eltype of the created vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we could have a fallback constructor which would try to find a narrow type. collect(v for v in x) would do the trick though ideally we would avoid making a copy (but that would take more work).

I also vote for including Symbol. A lot of MLJ tests already use Symbol and I presume others have committed the same sin. We could live without it I guess, but excluding it as a workaround for compilation latency issues sounds a little perverse to me. Why are Symbols a problem for the invalidations (which I don't know anything about) but Strings are not? Perhaps this is something that can be addressed on the compiler end, as it likely turns up elsewhere.

This whole PR is kind of perverse already. The point is that it makes things less flexible to reduce work for the compiler. Invalidations are a tricky business. Things have improved a lot recently but AFAICT the issue will remain (see #177). String is as problematic as Symbol, but supporting both creates twice as many invalidations so that's still a problem.

Since both types are very similar and Symbol has no advantage over String in this context I'd rather bite the bullet and stop supporting it if it's used mainly in tests. Are you aware of actual uses in real use cases where String wouldn't make more sense? (Note that we can easily support it again in the future if the compiler improves.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would avoid making a copy (but that would take more work)

In data frames I implemented a procedure where we check the first element of the array and decide what to do using this information (a special case is 0-length array with eltype Any then, but it can probably throw an error)

Are you aware of actual uses in real use cases where string wouldn't make more sense?

It is very common to use Symbol instead of string to ensure fast comparisons. This happens before things are converted to CategoricalArray (of course we can say to users - do not use Symbol in such cases, but use CategoricalVectors instead, but this will break many current workflows, when users are creating Dict{Symbol, SomeType} to have a fast dict lookup).

Copy link
Member Author

@nalimilan nalimilan Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I get when allowing Symbol. This is much worse than without it. Not sure how many of these can be fixed. EDIT: I'm afraid the == ones can't be fixed and they are quite serious.

julia> trees = invalidation_trees(invalidations)
12-element Vector{SnoopCompile.MethodInvalidations}:
 inserting Base.IteratorSize(::Type{var"#s2"} where var"#s2"<:(Missings.EachReplaceMissing{T, U} where U)) where T in Missings at /home/milan/.julia/packages/Missings/notdc/src/Missings.jl:83 invalidated:
   backedges: 1: superseding Base.IteratorSize(::Type) in Base at generator.jl:91 with MethodInstance for Base.IteratorSize(::Type{var"#s433"} where var"#s433"<:LinearAlgebra.Factorization{T} where T) (3 children)
   50 mt_cache

 inserting similar(A::AbstractRange, ::Type{Union{Missing, CategoricalValue{T, R} where R<:Integer}}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:649 invalidated:
   backedges: 1: superseding similar(a::AbstractArray, ::Type{T}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in Base at abstractarray.jl:730 with MethodInstance for similar(::UnitRange{Int64}, ::Type, ::Tuple{Int64}) (3 children)

 inserting similar(A::AbstractRange, ::Type{CategoricalValue{T, R}}, dims::Tuple{Vararg{Int64, N}}) where {T, R, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:640 invalidated:
   backedges: 1: superseding similar(a::AbstractArray, ::Type{T}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in Base at abstractarray.jl:730 with MethodInstance for similar(::UnitRange{Int64}, ::DataType, ::Tuple{Int64}) (3 children)

 inserting promote_rule(::Type{C}, ::Type{T}) where {C<:CategoricalValue, T} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:54 invalidated:
   backedges: 1: superseding promote_rule(::Type{var"#s824"} where var"#s824"<:AbstractIrrational, ::Type{T}) where T<:Real in Base at irrationals.jl:42 with MethodInstance for promote_rule(::Type{Union{}}, ::Type{UInt64}) (1 children)
              2: superseding promote_rule(::Type{var"#s73"} where var"#s73", ::Type{var"#s72"} where var"#s72") in Base at promotion.jl:245 with MethodInstance for promote_rule(::Type{T} where T<:Unsigned, ::Type{UInt64}) (5 children)
   86 mt_cache

 inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R}}, dims::Tuple{Vararg{Int64, N}}) where {T, R, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:640 invalidated:
   backedges: 1: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::DataType, ::Tuple{Int64}) (1 children)
              2: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::DataType, ::Tuple{Int64}) (1 children)
              3: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{T} where T, ::DataType, ::Tuple{Int64}) (6 children)

 inserting similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where {U, T<:(Array{Union{Missing, CategoricalValue{U, R} where R<:Integer}, N} where N)} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:660 invalidated:
   backedges: 1: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Array{_A, _B}}, ::Tuple{Int64}) where {_A, _B} (4 children)
              2: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Vector{_A}} where _A, ::Tuple{Int64}) (8 children)
              3: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Array{_A, N} where N} where _A, ::Tuple{Int64}) (11 children)

 inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R} where R<:Integer}) where T in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:643 invalidated:
   backedges: 1: superseding similar(a::Vector{T}, S::Type) where T in Base at array.jl:355 with MethodInstance for similar(::Vector{_A} where _A, ::Type) (26 children)
   1 mt_cache

 inserting convert(::Type{Union{Nothing, S}}, x::CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number, Symbol} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:77 invalidated:
   backedges: 1: superseding convert(::Type{T}, x) where T>:Nothing in Base at some.jl:36 with MethodInstance for convert(::Type{Union{Nothing, String}}, ::Any) (32 children)

 inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R} where R<:Integer}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:643 invalidated:
   backedges: 1: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Union{Int64, Symbol}}, ::Type, ::Tuple{Int64}) (1 children)
              2: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::Type, ::Tuple{Int64}) (3 children)
              3: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::Type, ::Tuple{Int64}) (3 children)
              4: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::Type, ::Tuple{Int64}) (5 children)
              5: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::Type, ::Tuple{Int64}) (5 children)
              6: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{T} where T, ::Type, ::Tuple{Int64}) (32 children)
   20 mt_cache

 inserting ==(x::Union{AbstractChar, AbstractString, Number, Symbol}, y::CategoricalValue) in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:111 invalidated:
   backedges: 1: superseding ==(x, y) in Base at Base.jl:87 with MethodInstance for ==(::Symbol, ::Any) (121 children)

 inserting ==(x::CategoricalValue, y::Union{AbstractChar, AbstractString, Number, Symbol}) in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:110 invalidated:
   backedges: 1: superseding ==(x, y) in Base at Base.jl:87 with MethodInstance for ==(::Any, ::Symbol) (212 children)

 inserting convert(::Type{S}, x::CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number, Symbol} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:73 invalidated:
   mt_backedges:   1: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Base.IRShow.var"#33#35"}(::Any, ::Any) (0 children)
                   2: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Base.IRShow.var"#34#36"}(::Any, ::Any) (0 children)
                   3: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::Dict{Symbol, Any}, ::Any, ::Any) (0 children)
                   4: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(+)}(::Any, ::Any) (0 children)
                   5: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(-)}(::Any, ::Any) (0 children)
                   6: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(*)}(::Any, ::Any) (0 children)
                   7: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(/)}(::Any, ::Any) (0 children)
                   8: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(^)}(::Any, ::Any) (0 children)
                   9: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(div)}(::Any, ::Any) (0 children)
                  10: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::Dict{Symbol, Function}, ::Any, ::Any) (0 children)
                  11: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(==)}(::Any, ::Any) (0 children)
                  12: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(<)}(::Any, ::Any) (0 children)
                  13: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(>)}(::Any, ::Any) (0 children)
                  14: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(<=)}(::Any, ::Any) (0 children)
                  15: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(>=)}(::Any, ::Any) (0 children)
                  16: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{DataType, String}(::Any, ::Any) (0 children)
                  17: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Any}, ::Any, ::Any) (0 children)
                  18: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#112#165"}(::Any, ::Any) (0 children)
                  19: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#113#166"}(::Any, ::Any) (0 children)
                  20: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#114#167"}(::Any, ::Any) (0 children)
                  21: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#115#168"}(::Any, ::Any) (0 children)
                  22: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#116#169"}(::Any, ::Any) (0 children)
                  23: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#117#170"}(::Any, ::Any) (0 children)
                  24: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#118#171"}(::Any, ::Any) (0 children)
                  25: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#119#172"}(::Any, ::Any) (0 children)
                  26: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#120#173"}(::Any, ::Any) (0 children)
                  27: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#121#174"}(::Any, ::Any) (0 children)
                  28: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#122#175"}(::Any, ::Any) (0 children)
                  29: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#123#176"}(::Any, ::Any) (0 children)
                  30: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#124#177"}(::Any, ::Any) (0 children)
                  31: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#125#178"}(::Any, ::Any) (0 children)
                  32: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#126#179"}(::Any, ::Any) (0 children)
                  33: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#127#180"}(::Any, ::Any) (0 children)
                  34: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#128#181"}(::Any, ::Any) (0 children)
                  35: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#129#182"}(::Any, ::Any) (0 children)
                  36: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#130#183"}(::Any, ::Any) (0 children)
                  37: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#131#184"}(::Any, ::Any) (0 children)
                  38: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#132#185"}(::Any, ::Any) (0 children)
                  39: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#133#186"}(::Any, ::Any) (0 children)
                  40: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#134#187"}(::Any, ::Any) (0 children)
                  41: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#135#188"}(::Any, ::Any) (0 children)
                  42: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#136#189"}(::Any, ::Any) (0 children)
                  43: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#137#190"}(::Any, ::Any) (0 children)
                  44: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#138#191"}(::Any, ::Any) (0 children)
                  45: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#139#192"}(::Any, ::Any) (0 children)
                  46: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#140#193"}(::Any, ::Any) (0 children)
                  47: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#141#194"}(::Any, ::Any) (0 children)
                  48: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#142#195"}(::Any, ::Any) (0 children)
                  49: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#143#196"}(::Any, ::Any) (0 children)
                  50: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#144#197"}(::Any, ::Any) (0 children)
                  51: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#145#198"}(::Any, ::Any) (0 children)
                  52: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#146#199"}(::Any, ::Any) (0 children)
                  53: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#147#200"}(::Any, ::Any) (0 children)
                  54: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#148#201"}(::Any, ::Any) (0 children)
                  55: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#149#202"}(::Any, ::Any) (0 children)
                  56: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#150#203"}(::Any, ::Any) (0 children)
                  57: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#151#204"}(::Any, ::Any) (0 children)
                  58: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#152#205"}(::Any, ::Any) (0 children)
                  59: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#153#206"}(::Any, ::Any) (0 children)
                  60: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#154#207"}(::Any, ::Any) (0 children)
                  61: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#155#208"}(::Any, ::Any) (0 children)
                  62: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#156#209"}(::Any, ::Any) (0 children)
                  63: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#157#210"}(::Any, ::Any) (0 children)
                  64: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#158#211"}(::Any, ::Any) (0 children)
                  65: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#159#212"}(::Any, ::Any) (0 children)
                  66: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#160#213"}(::Any, ::Any) (0 children)
                  67: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#161#214"}(::Any, ::Any) (0 children)
                  68: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#215#225"}(::Any, ::Any) (0 children)
                  69: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#216#226"}(::Any, ::Any) (0 children)
                  70: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#217#227"}(::Any, ::Any) (0 children)
                  71: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#218#228"}(::Any, ::Any) (0 children)
                  72: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#219#229"}(::Any, ::Any) (0 children)
                  73: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#220#230"}(::Any, ::Any) (0 children)
                  74: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#221#231"}(::Any, ::Any) (0 children)
                  75: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#222#232"}(::Any, ::Any) (0 children)
                  76: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#223#233"}(::Any, ::Any) (0 children)
                  77: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#224#234"}(::Any, ::Any) (0 children)
                  78: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#235#243"}(::Any, ::Any) (0 children)
                  79: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#236#244"}(::Any, ::Any) (0 children)
                  80: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#237#245"}(::Any, ::Any) (0 children)
                  81: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#238#246"}(::Any, ::Any) (0 children)
                  82: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#239#247"}(::Any, ::Any) (0 children)
                  83: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.test)}(::Any, ::Any) (0 children)
                  84: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_package)}(::Any, ::Any) (0 children)
                  85: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_installed_packages)}(::Any, ::Any) (0 children)
                  86: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(identity)}(::Any, ::Any) (0 children)
                  87: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#63#68"}(::Any, ::Any) (0 children)
                  88: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_help)}(::Any, ::Any) (0 children)
                  89: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.instantiate)}(::Any, ::Any) (0 children)
                  90: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.rm)}(::Any, ::Any) (0 children)
                  91: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.add)}(::Any, ::Any) (0 children)
                  92: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#64#69"}(::Any, ::Any) (0 children)
                  93: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.do_preserve)}(::Any, ::Any) (0 children)
                  94: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_add_dev)}(::Any, ::Any) (0 children)
                  95: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.develop)}(::Any, ::Any) (0 children)
                  96: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#65#70"}(::Any, ::Any) (0 children)
                  97: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.free)}(::Any, ::Any) (0 children)
                  98: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.pin)}(::Any, ::Any) (0 children)
                  99: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.build)}(::Any, ::Any) (0 children)
                 100: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.resolve)}(::Any, ::Any) (0 children)
                 101: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.activate)}(::Any, ::Any) (0 children)
                 102: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_activate)}(::Any, ::Any) (0 children)
                 103: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_activate)}(::Any, ::Any) (0 children)
                 104: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.up)}(::Any, ::Any) (0 children)
                 105: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.generate_deprecated)}(::Any, ::Any) (0 children)
                 106: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#66#71"}(::Any, ::Any) (0 children)
                 107: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.precompile)}(::Any, ::Any) (0 children)
                 108: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.status)}(::Any, ::Any) (0 children)
                 109: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.gc)}(::Any, ::Any) (0 children)
                 110: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.undo)}(::Any, ::Any) (0 children)
                 111: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.redo)}(::Any, ::Any) (0 children)
                 112: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.add)}(::Any, ::Any) (0 children)
                 113: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#67#72"}(::Any, ::Any) (0 children)
                 114: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.rm)}(::Any, ::Any) (0 children)
                 115: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_registry)}(::Any, ::Any) (0 children)
                 116: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.update)}(::Any, ::Any) (0 children)
                 117: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.status)}(::Any, ::Any) (0 children)
                 118: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for convert(::Type{Union{Nothing, String}}, ::Any) (0 children)
                 119: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Vector{Pkg.Types.Stage1}}, ::Vector{Pkg.Types.Stage1}, ::Any) (0 children)
                 120: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Vector{String}, ::Any, ::Int64) (0 children)
                 121: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Fail(::Symbol, ::Any, ::Any, ::Bool, ::LineNumberNode) (0 children)
                 122: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Error(::Symbol, ::Any, ::Any, ::Nothing, ::LineNumberNode) (0 children)
                 123: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Error(::Symbol, ::Any, ::Any, ::Any, ::LineNumberNode) (0 children)
                 124: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, String}, ::Any, ::Any) (0 children)
                 125: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Tar.Header(::Any, ::Any, ::Any, ::Any, ::Any) (0 children)
                 126: signature Tuple{typeof(convert), Type{Union{String, Symbol}}, Any} triggered MethodInstance for setindex!(::Dict{String, Union{String, Symbol}}, ::Any, ::String) (0 children)
                 127: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for push!(::Vector{String}, ::Any) (0 children)
                 128: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Union{Base.SHA1, String}}, ::Any, ::Any) (0 children)
                 129: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Vector{String}}, ::Any, ::Any) (0 children)
                 130: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Base.UUID}, ::Base.UUID, ::Any) (0 children)
                 131: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Pkg.Types.PackageSpec, String}, _A} where _A)(::Int64) (0 children)
                 132: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::IdDict{Any, String}, ::Any, ::Any) (1 children)
                 133: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for convert(::Type{Pair{Symbol, Any}}, ::Pair) (1 children)
                 134: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.MethodInstance, ::Bool, ::Bool, ::Int64) (1 children)
                 135: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Nothing, ::Bool, ::Any, ::Int64) (1 children)
                 136: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.CodeInfo, ::Bool, ::Any, ::Int64) (1 children)
                 137: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.MethodInstance, ::Bool, ::Any, ::Int64) (1 children)
                 138: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setproperty!(::Base.RefValue{Symbol}, ::Symbol, ::Any) (1 children)
                 139: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, String}, ::Any, ::String) (1 children)
                 140: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Tar.Header(::Any, ::Any, ::Any, ::Any, ::Any) (1 children)
                 141: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for REPL.LineEdit.PrefixSearchState(::Any, ::Any, ::Any, ::Any) (1 children)
                 142: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for REPL.LineEditREPL(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) (1 children)
                 143: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{String, String}, _A} where _A)(::Int64) (3 children)
                 144: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Base.UUID, String, String, VersionNumber}, _A} where _A)(::Int64) (3 children)
                 145: signature Tuple{typeof(convert), Union{Type{String}, Type{Pkg.Types.PackageSpec}}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Pkg.Types.PackageSpec, String}, _A} where _A)(::Int64) (5 children)
                 146: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for REPL.REPLCompletions.FieldCompletion(::DataType, ::Any) (6 children)
                 147: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.ImmutableDict{Symbol, Any}(::Base.ImmutableDict{Symbol, Any}, ::Any, ::Any) (9 children)
                 148: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::IdDict{Module, Symbol}, ::Any, ::Any) (10 children)
                 149: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.CodeInfo, ::Bool, ::Bool, ::Int64) (13 children)
                 150: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{String, Int32}, _A} where _A)(::Int64) (18 children)
                 151: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::UInt64) (42 children)
                 152: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Any}(::Any, ::Any) (97 children)
                 153: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for merge(::NamedTuple{(), Tuple{}}, ::Base.Iterators.Pairs) (342 children)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, there are currently 6397 invalidations with Julia master and CategoricalArrays master. With this PR this goes down to 1466 invalidations (minus 200 with JuliaIO/Tar.jl#78). Additionally allowing Symbol increases this number to 2981.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of actual uses in real use cases where String wouldn't make more sense?

No. But it would not just be tests - probably some tutorials also.

For what it's worth. I have no complaints about CategoricalArrays in terms of load/compile times. Other dependencies in MLJ are way worse. Is there a strong use case for optimisation of CAs at this time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not so much load and compilation times than the latency that invalidations generate: they make other (often unrelated) operations slow the first time they are run. This notably affects the REPL's responsiveness.

I wouldn't care too much about this myself TBH. But it has become a recurring theme on Julia forums, to the point that when CategoricalArrays is mentioned people comment that it slows down everything and advise to use something else. So I'd rather concentrate on our core functionality and be fast at it, than be super flexible but with fewer users.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and advise to use something else

Oh no!

Thanks for the context.

@bkamins
Copy link
Member

bkamins commented Oct 31, 2020

I would also add Symbol to the list of supported types.

@bkamins
Copy link
Member

bkamins commented Oct 31, 2020

Also while we are at it it would be great to add unwrap function that works like get but would not error on missing or nothing.
Keeping using get (which should be deprecated I think) is problematic as it extends get from Base so it is problematic to add the cases with missing and nothing as it would be type piracy + get is inconsistent with the specification what get does in Base.

@nalimilan
Copy link
Member Author

I would also add Symbol to the list of supported types.

I thought about it, but I'm hesitant. Symbol is mainly useful to represent names of language objects or benefit from interning. Both are unlikely to be useful with CategoricalArray. And supporting Symbol will introduces more invalidations. We could try without it and see whether people really need it.

Also while we are at it it would be great to add unwrap function that works like get but would not error on missing or nothing.
Keeping using get (which should be deprecated I think) is problematic as it extends get from Base so it is problematic to add the cases with missing and nothing as it would be type piracy + get is inconsistent with the specification what get does in Base.

Let's handle this separately as it's completely orthogonal AFAICT.

@bkamins
Copy link
Member

bkamins commented Oct 31, 2020

Both are unlikely to be useful with CategoricalArray.

OK. In DataFrames 0.21 it would throw an error, but fortunately in 0.22 we removed creation of CategoricalVector{Symbol} in stack.

@bkamins
Copy link
Member

bkamins commented Nov 3, 2020

@ablaom + @nalimilan - given the findings of @nalimilan that allowing Symbol increases the number of invalidations more than twice (from 1400 to almost 3000) I think, given latency reduction is an important objective for the ecosystem in general and the fact that Symbol was really originally intended not to be used as data storage, that we can go this way:

  1. We disallow Symbol in CategoricalArrays.jl for now as a storage type (regretfully, as for months I have preferred to have it the other way around)
  2. We might consider adding a constructor that when it sees a collection containing Symbol it converts them to String? The benefit of such an approach is that it will be much more efficient than categorical(string.(collection))

@ablaom
Copy link

ablaom commented Nov 3, 2020

  1. We might consider adding a constructor that when it sees a collection containing Symbol it converts them to String? The benefit of such an approach is that it will be much more efficient than categorical(string.(collection))

I thought about this but I think it would only lead to unexpected problems. For example, using == to compare a CategoricalValue{String} "blue" to :blue will not return true, right?

@bkamins
Copy link
Member

bkamins commented Nov 3, 2020

will not return true, right?

Right - unfortunately.

@nalimilan
Copy link
Member Author

Yes that would be the limit of converting automatically symbols to CategoricalValue{String}. Do you think it would still be better than throwing an error, at least as a temporary deprecation path (with a warning)?

@ablaom
Copy link

ablaom commented Nov 4, 2020

Well, I wouldn't trust my code to be correct anymore, so I don't see any advantage of a warning over an informative error.

@nalimilan
Copy link
Member Author

OK. I've pushed commits to narrow element type when possible and improve error messages. Let me know what you think.

Copy link

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes look good to me.

@nalimilan
Copy link
Member Author

OK, thanks, let's see how it goes...

@nalimilan nalimilan merged commit 8709c1f into master Nov 7, 2020
@nalimilan nalimilan deleted the nl/invalidations branch November 7, 2020 12:16
@bkamins
Copy link
Member

bkamins commented Nov 7, 2020

Is now a release going to be tagged and we should update DataFrames.jl dependency?

@nalimilan
Copy link
Member Author

Yes. I've bumped the version to 0.9.0 on master, I can tag at any time when DataFrames 0.22 is ready.

@bkamins
Copy link
Member

bkamins commented Nov 7, 2020

I thin it can be tagged. As we need to make a PR bumping a dependency in DataFrames.jl. This will hopefully make H2O benchmarks a bit better before we make 1.0 release.

@nalimilan
Copy link
Member Author

@jlumpe
Copy link

jlumpe commented Mar 2, 2021

This is sort of a bummer, I was using it with Tuple{String, String} and now it no longer works for me. Any plans to add back support for arbitrary eltypes in the future?

@nalimilan
Copy link
Member Author

Not unless a solution is found to reduce invalidations (or their impact) in Julia, I'm afraid... :-/

Out of curiosity, could you tell me more about your use case? It's always interesting to know what people need.

@jlumpe
Copy link

jlumpe commented Mar 2, 2021

Can you explain what the term "invalidation" means here?

I'm essentially using it to store values from two different columns in a table, which form a sort of composite primary key for another set of objects.

@nalimilan
Copy link
Member Author

Can you explain what the term "invalidation" means here?

See https://julialang.org/blog/2020/08/invalidations/, #177 and discussion above.

I'm essentially using it to store values from two different columns in a table, which form a sort of composite primary key for another set of objects.

OK. Any reason not to use PooledArray instead of CategoricalArray?

@rokapre
Copy link

rokapre commented Mar 26, 2021

Hello,

Recently in Julia 1.6 and the most recent version of Categorical Arrays (v0.9.3) some of my code broke because it seems like CategoricalArray cannot handle types like NamedTuple among other Julia-specific types. I say Julia-specific because often times I am using RCall and ggplot2 for plotting and in order to pass a dataframe into R, I used to convert it using categorical() so that in R it would show up as a factor for the aes() argument in ggplot.

Also, even when using libraries like GLM.jl or MixedModels.jl, categorical needs to be used and I used to be able to just convert the type to a categorical and proceed. But now there is an extra step of needing to convert with string.() and then converting to categorical which is not as clean as before.

Just mentioning these use-cases of categorical in case in the future there is a way to support them.

Best,

Rohan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants