-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restrict wrapped types to reduce invalidations #310
Conversation
Methods like `convert(::Type{Any}, ::CategoricalValue)` triggers lots of invalidations. Restricting the wrapped types to `AbstractString`, `AbstractChar` and `Number` alleviates this problem without affecting usability too much. One limit with this approach is that e.g. `convert(Union{String, T}, CategoricalValue{String})` won't work anymore for any `T`, even though `convert(String, CategoricalValue{String})` will. Though thanks to special casing, it will work for `T <: Missing` and `T <: Nothing`. This also means that `CategoricalArray{Any}` is no longer supported. Adapt promotion rules to ensure that mixing e.g. strings and integers gives `Union{String, Integer}`. Drop support for `nothing`. Also remove the custom `repr` method which was a legacy of the `CategoricalString` era.
# get! with CategoricalValue{Any} (#220) | ||
p1 = CategoricalPool(Any['a', 'b', 'c']) | ||
p2 = CategoricalPool(Any['a', 'b', 'x']) | ||
@test get!(p1, p2[1]) === UInt32(1) | ||
@test get!(p1, p2[3]) === UInt32(4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not that we really need CategoricalArray{Any}
but users will invariably try to create categorical arrays out of Any
vectors. (I'm pretty sure that's why we went to the trouble of working out our corner cases.)
So what will happen if I call categorical(Any[1, 2, 3])
in this PR? If the fallback was to first look for a type tightening ( v -> [v...]) then maybe we could live with this. But if it just throws an error, this would be pretty annoying, I imagine.
I also vote for including Symbol
. A lot of MLJ tests already use Symbol
and I presume others have committed the same sin. We could live without it I guess, but excluding it as a workaround for compilation latency issues sounds a little perverse to me. Why are Symbol
s a problem for the invalidations (which I don't know anything about) but String
s are not? Perhaps this is something that can be addressed on the compiler end, as it likely turns up elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fallback was to first look for a type tightening
I think fallback should check the contents not eltype
of the created vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah we could have a fallback constructor which would try to find a narrow type. collect(v for v in x)
would do the trick though ideally we would avoid making a copy (but that would take more work).
I also vote for including
Symbol
. A lot of MLJ tests already useSymbol
and I presume others have committed the same sin. We could live without it I guess, but excluding it as a workaround for compilation latency issues sounds a little perverse to me. Why areSymbol
s a problem for the invalidations (which I don't know anything about) butString
s are not? Perhaps this is something that can be addressed on the compiler end, as it likely turns up elsewhere.
This whole PR is kind of perverse already. The point is that it makes things less flexible to reduce work for the compiler. Invalidations are a tricky business. Things have improved a lot recently but AFAICT the issue will remain (see #177). String
is as problematic as Symbol
, but supporting both creates twice as many invalidations so that's still a problem.
Since both types are very similar and Symbol
has no advantage over String
in this context I'd rather bite the bullet and stop supporting it if it's used mainly in tests. Are you aware of actual uses in real use cases where String
wouldn't make more sense? (Note that we can easily support it again in the future if the compiler improves.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we would avoid making a copy (but that would take more work)
In data frames I implemented a procedure where we check the first element of the array and decide what to do using this information (a special case is 0-length array with eltype Any
then, but it can probably throw an error)
Are you aware of actual uses in real use cases where string wouldn't make more sense?
It is very common to use Symbol
instead of string to ensure fast comparisons. This happens before things are converted to CategoricalArray
(of course we can say to users - do not use Symbol
in such cases, but use CategoricalVector
s instead, but this will break many current workflows, when users are creating Dict{Symbol, SomeType}
to have a fast dict lookup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I get when allowing Symbol
. This is much worse than without it. Not sure how many of these can be fixed. EDIT: I'm afraid the ==
ones can't be fixed and they are quite serious.
julia> trees = invalidation_trees(invalidations)
12-element Vector{SnoopCompile.MethodInvalidations}:
inserting Base.IteratorSize(::Type{var"#s2"} where var"#s2"<:(Missings.EachReplaceMissing{T, U} where U)) where T in Missings at /home/milan/.julia/packages/Missings/notdc/src/Missings.jl:83 invalidated:
backedges: 1: superseding Base.IteratorSize(::Type) in Base at generator.jl:91 with MethodInstance for Base.IteratorSize(::Type{var"#s433"} where var"#s433"<:LinearAlgebra.Factorization{T} where T) (3 children)
50 mt_cache
inserting similar(A::AbstractRange, ::Type{Union{Missing, CategoricalValue{T, R} where R<:Integer}}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:649 invalidated:
backedges: 1: superseding similar(a::AbstractArray, ::Type{T}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in Base at abstractarray.jl:730 with MethodInstance for similar(::UnitRange{Int64}, ::Type, ::Tuple{Int64}) (3 children)
inserting similar(A::AbstractRange, ::Type{CategoricalValue{T, R}}, dims::Tuple{Vararg{Int64, N}}) where {T, R, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:640 invalidated:
backedges: 1: superseding similar(a::AbstractArray, ::Type{T}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in Base at abstractarray.jl:730 with MethodInstance for similar(::UnitRange{Int64}, ::DataType, ::Tuple{Int64}) (3 children)
inserting promote_rule(::Type{C}, ::Type{T}) where {C<:CategoricalValue, T} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:54 invalidated:
backedges: 1: superseding promote_rule(::Type{var"#s824"} where var"#s824"<:AbstractIrrational, ::Type{T}) where T<:Real in Base at irrationals.jl:42 with MethodInstance for promote_rule(::Type{Union{}}, ::Type{UInt64}) (1 children)
2: superseding promote_rule(::Type{var"#s73"} where var"#s73", ::Type{var"#s72"} where var"#s72") in Base at promotion.jl:245 with MethodInstance for promote_rule(::Type{T} where T<:Unsigned, ::Type{UInt64}) (5 children)
86 mt_cache
inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R}}, dims::Tuple{Vararg{Int64, N}}) where {T, R, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:640 invalidated:
backedges: 1: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::DataType, ::Tuple{Int64}) (1 children)
2: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::DataType, ::Tuple{Int64}) (1 children)
3: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{T} where T, ::DataType, ::Tuple{Int64}) (6 children)
inserting similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where {U, T<:(Array{Union{Missing, CategoricalValue{U, R} where R<:Integer}, N} where N)} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:660 invalidated:
backedges: 1: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Array{_A, _B}}, ::Tuple{Int64}) where {_A, _B} (4 children)
2: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Vector{_A}} where _A, ::Tuple{Int64}) (8 children)
3: superseding similar(::Type{T}, dims::Tuple{Vararg{Int64, N}} where N) where T<:AbstractArray in Base at abstractarray.jl:765 with MethodInstance for similar(::Type{Array{_A, N} where N} where _A, ::Tuple{Int64}) (11 children)
inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R} where R<:Integer}) where T in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:643 invalidated:
backedges: 1: superseding similar(a::Vector{T}, S::Type) where T in Base at array.jl:355 with MethodInstance for similar(::Vector{_A} where _A, ::Type) (26 children)
1 mt_cache
inserting convert(::Type{Union{Nothing, S}}, x::CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number, Symbol} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:77 invalidated:
backedges: 1: superseding convert(::Type{T}, x) where T>:Nothing in Base at some.jl:36 with MethodInstance for convert(::Type{Union{Nothing, String}}, ::Any) (32 children)
inserting similar(A::Vector{T} where T, ::Type{CategoricalValue{T, R} where R<:Integer}, dims::Tuple{Vararg{Int64, N}}) where {T, N} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/array.jl:643 invalidated:
backedges: 1: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Union{Int64, Symbol}}, ::Type, ::Tuple{Int64}) (1 children)
2: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::Type, ::Tuple{Int64}) (3 children)
3: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::Type, ::Tuple{Int64}) (3 children)
4: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Pair{DataType, Function}}, ::Type, ::Tuple{Int64}) (5 children)
5: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{Any}, ::Type, ::Tuple{Int64}) (5 children)
6: superseding similar(a::Array, T::Type, dims::Tuple{Vararg{Int64, N}}) where N in Base at array.jl:358 with MethodInstance for similar(::Vector{T} where T, ::Type, ::Tuple{Int64}) (32 children)
20 mt_cache
inserting ==(x::Union{AbstractChar, AbstractString, Number, Symbol}, y::CategoricalValue) in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:111 invalidated:
backedges: 1: superseding ==(x, y) in Base at Base.jl:87 with MethodInstance for ==(::Symbol, ::Any) (121 children)
inserting ==(x::CategoricalValue, y::Union{AbstractChar, AbstractString, Number, Symbol}) in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:110 invalidated:
backedges: 1: superseding ==(x, y) in Base at Base.jl:87 with MethodInstance for ==(::Any, ::Symbol) (212 children)
inserting convert(::Type{S}, x::CategoricalValue) where S<:Union{AbstractChar, AbstractString, Number, Symbol} in CategoricalArrays at /home/milan/.julia/dev/CategoricalArrays/src/value.jl:73 invalidated:
mt_backedges: 1: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Base.IRShow.var"#33#35"}(::Any, ::Any) (0 children)
2: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Base.IRShow.var"#34#36"}(::Any, ::Any) (0 children)
3: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::Dict{Symbol, Any}, ::Any, ::Any) (0 children)
4: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(+)}(::Any, ::Any) (0 children)
5: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(-)}(::Any, ::Any) (0 children)
6: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(*)}(::Any, ::Any) (0 children)
7: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(/)}(::Any, ::Any) (0 children)
8: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(^)}(::Any, ::Any) (0 children)
9: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(div)}(::Any, ::Any) (0 children)
10: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::Dict{Symbol, Function}, ::Any, ::Any) (0 children)
11: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(==)}(::Any, ::Any) (0 children)
12: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(<)}(::Any, ::Any) (0 children)
13: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(>)}(::Any, ::Any) (0 children)
14: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(<=)}(::Any, ::Any) (0 children)
15: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(>=)}(::Any, ::Any) (0 children)
16: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{DataType, String}(::Any, ::Any) (0 children)
17: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Any}, ::Any, ::Any) (0 children)
18: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#112#165"}(::Any, ::Any) (0 children)
19: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#113#166"}(::Any, ::Any) (0 children)
20: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#114#167"}(::Any, ::Any) (0 children)
21: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#115#168"}(::Any, ::Any) (0 children)
22: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#116#169"}(::Any, ::Any) (0 children)
23: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#117#170"}(::Any, ::Any) (0 children)
24: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#118#171"}(::Any, ::Any) (0 children)
25: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#119#172"}(::Any, ::Any) (0 children)
26: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#120#173"}(::Any, ::Any) (0 children)
27: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#121#174"}(::Any, ::Any) (0 children)
28: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#122#175"}(::Any, ::Any) (0 children)
29: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#123#176"}(::Any, ::Any) (0 children)
30: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#124#177"}(::Any, ::Any) (0 children)
31: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#125#178"}(::Any, ::Any) (0 children)
32: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#126#179"}(::Any, ::Any) (0 children)
33: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#127#180"}(::Any, ::Any) (0 children)
34: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#128#181"}(::Any, ::Any) (0 children)
35: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#129#182"}(::Any, ::Any) (0 children)
36: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#130#183"}(::Any, ::Any) (0 children)
37: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#131#184"}(::Any, ::Any) (0 children)
38: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#132#185"}(::Any, ::Any) (0 children)
39: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#133#186"}(::Any, ::Any) (0 children)
40: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#134#187"}(::Any, ::Any) (0 children)
41: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#135#188"}(::Any, ::Any) (0 children)
42: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#136#189"}(::Any, ::Any) (0 children)
43: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#137#190"}(::Any, ::Any) (0 children)
44: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#138#191"}(::Any, ::Any) (0 children)
45: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#139#192"}(::Any, ::Any) (0 children)
46: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#140#193"}(::Any, ::Any) (0 children)
47: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#141#194"}(::Any, ::Any) (0 children)
48: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#142#195"}(::Any, ::Any) (0 children)
49: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#143#196"}(::Any, ::Any) (0 children)
50: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#144#197"}(::Any, ::Any) (0 children)
51: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#145#198"}(::Any, ::Any) (0 children)
52: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#146#199"}(::Any, ::Any) (0 children)
53: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#147#200"}(::Any, ::Any) (0 children)
54: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#148#201"}(::Any, ::Any) (0 children)
55: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#149#202"}(::Any, ::Any) (0 children)
56: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#150#203"}(::Any, ::Any) (0 children)
57: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#151#204"}(::Any, ::Any) (0 children)
58: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#152#205"}(::Any, ::Any) (0 children)
59: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#153#206"}(::Any, ::Any) (0 children)
60: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#154#207"}(::Any, ::Any) (0 children)
61: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#155#208"}(::Any, ::Any) (0 children)
62: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#156#209"}(::Any, ::Any) (0 children)
63: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#157#210"}(::Any, ::Any) (0 children)
64: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#158#211"}(::Any, ::Any) (0 children)
65: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#159#212"}(::Any, ::Any) (0 children)
66: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#160#213"}(::Any, ::Any) (0 children)
67: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#161#214"}(::Any, ::Any) (0 children)
68: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#215#225"}(::Any, ::Any) (0 children)
69: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#216#226"}(::Any, ::Any) (0 children)
70: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#217#227"}(::Any, ::Any) (0 children)
71: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#218#228"}(::Any, ::Any) (0 children)
72: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#219#229"}(::Any, ::Any) (0 children)
73: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#220#230"}(::Any, ::Any) (0 children)
74: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#221#231"}(::Any, ::Any) (0 children)
75: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#222#232"}(::Any, ::Any) (0 children)
76: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#223#233"}(::Any, ::Any) (0 children)
77: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#224#234"}(::Any, ::Any) (0 children)
78: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#235#243"}(::Any, ::Any) (0 children)
79: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#236#244"}(::Any, ::Any) (0 children)
80: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#237#245"}(::Any, ::Any) (0 children)
81: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#238#246"}(::Any, ::Any) (0 children)
82: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Pair{String, REPL.LineEdit.var"#239#247"}(::Any, ::Any) (0 children)
83: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.test)}(::Any, ::Any) (0 children)
84: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_package)}(::Any, ::Any) (0 children)
85: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_installed_packages)}(::Any, ::Any) (0 children)
86: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(identity)}(::Any, ::Any) (0 children)
87: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#63#68"}(::Any, ::Any) (0 children)
88: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_help)}(::Any, ::Any) (0 children)
89: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.instantiate)}(::Any, ::Any) (0 children)
90: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.rm)}(::Any, ::Any) (0 children)
91: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.add)}(::Any, ::Any) (0 children)
92: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#64#69"}(::Any, ::Any) (0 children)
93: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.do_preserve)}(::Any, ::Any) (0 children)
94: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_add_dev)}(::Any, ::Any) (0 children)
95: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.develop)}(::Any, ::Any) (0 children)
96: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#65#70"}(::Any, ::Any) (0 children)
97: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.free)}(::Any, ::Any) (0 children)
98: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.pin)}(::Any, ::Any) (0 children)
99: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.build)}(::Any, ::Any) (0 children)
100: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.resolve)}(::Any, ::Any) (0 children)
101: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.activate)}(::Any, ::Any) (0 children)
102: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_activate)}(::Any, ::Any) (0 children)
103: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.complete_activate)}(::Any, ::Any) (0 children)
104: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.up)}(::Any, ::Any) (0 children)
105: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.generate_deprecated)}(::Any, ::Any) (0 children)
106: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#66#71"}(::Any, ::Any) (0 children)
107: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.precompile)}(::Any, ::Any) (0 children)
108: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.status)}(::Any, ::Any) (0 children)
109: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.gc)}(::Any, ::Any) (0 children)
110: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.undo)}(::Any, ::Any) (0 children)
111: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.API.redo)}(::Any, ::Any) (0 children)
112: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.add)}(::Any, ::Any) (0 children)
113: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Pkg.REPLMode.var"#67#72"}(::Any, ::Any) (0 children)
114: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.rm)}(::Any, ::Any) (0 children)
115: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.REPLMode.parse_registry)}(::Any, ::Any) (0 children)
116: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.update)}(::Any, ::Any) (0 children)
117: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, typeof(Pkg.Registry.status)}(::Any, ::Any) (0 children)
118: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for convert(::Type{Union{Nothing, String}}, ::Any) (0 children)
119: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Vector{Pkg.Types.Stage1}}, ::Vector{Pkg.Types.Stage1}, ::Any) (0 children)
120: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Vector{String}, ::Any, ::Int64) (0 children)
121: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Fail(::Symbol, ::Any, ::Any, ::Bool, ::LineNumberNode) (0 children)
122: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Error(::Symbol, ::Any, ::Any, ::Nothing, ::LineNumberNode) (0 children)
123: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Test.Error(::Symbol, ::Any, ::Any, ::Any, ::LineNumberNode) (0 children)
124: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, String}, ::Any, ::Any) (0 children)
125: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Tar.Header(::Any, ::Any, ::Any, ::Any, ::Any) (0 children)
126: signature Tuple{typeof(convert), Type{Union{String, Symbol}}, Any} triggered MethodInstance for setindex!(::Dict{String, Union{String, Symbol}}, ::Any, ::String) (0 children)
127: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for push!(::Vector{String}, ::Any) (0 children)
128: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Union{Base.SHA1, String}}, ::Any, ::Any) (0 children)
129: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Vector{String}}, ::Any, ::Any) (0 children)
130: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, Base.UUID}, ::Base.UUID, ::Any) (0 children)
131: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Pkg.Types.PackageSpec, String}, _A} where _A)(::Int64) (0 children)
132: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::IdDict{Any, String}, ::Any, ::Any) (1 children)
133: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for convert(::Type{Pair{Symbol, Any}}, ::Pair) (1 children)
134: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.MethodInstance, ::Bool, ::Bool, ::Int64) (1 children)
135: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Nothing, ::Bool, ::Any, ::Int64) (1 children)
136: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.CodeInfo, ::Bool, ::Any, ::Int64) (1 children)
137: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.MethodInstance, ::Bool, ::Any, ::Int64) (1 children)
138: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setproperty!(::Base.RefValue{Symbol}, ::Symbol, ::Any) (1 children)
139: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for setindex!(::Dict{String, String}, ::Any, ::String) (1 children)
140: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Tar.Header(::Any, ::Any, ::Any, ::Any, ::Any) (1 children)
141: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for REPL.LineEdit.PrefixSearchState(::Any, ::Any, ::Any, ::Any) (1 children)
142: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for REPL.LineEditREPL(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::Any) (1 children)
143: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{String, String}, _A} where _A)(::Int64) (3 children)
144: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Base.UUID, String, String, VersionNumber}, _A} where _A)(::Int64) (3 children)
145: signature Tuple{typeof(convert), Union{Type{String}, Type{Pkg.Types.PackageSpec}}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{Pkg.Types.PackageSpec, String}, _A} where _A)(::Int64) (5 children)
146: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for REPL.REPLCompletions.FieldCompletion(::DataType, ::Any) (6 children)
147: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.ImmutableDict{Symbol, Any}(::Base.ImmutableDict{Symbol, Any}, ::Any, ::Any) (9 children)
148: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for setindex!(::IdDict{Module, Symbol}, ::Any, ::Any) (10 children)
149: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Core.CodeInfo, ::Bool, ::Bool, ::Int64) (13 children)
150: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for (::Base.var"#cvt1#1"{Tuple{String, Int32}, _A} where _A)(::Int64) (18 children)
151: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Base.StackTraces.StackFrame(::Any, ::Any, ::Any, ::Any, ::Any, ::Any, ::UInt64) (42 children)
152: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for Pair{Symbol, Any}(::Any, ::Any) (97 children)
153: signature Tuple{typeof(convert), Type{Symbol}, Any} triggered MethodInstance for merge(::NamedTuple{(), Tuple{}}, ::Base.Iterators.Pairs) (342 children)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise, there are currently 6397 invalidations with Julia master and CategoricalArrays master. With this PR this goes down to 1466 invalidations (minus 200 with JuliaIO/Tar.jl#78). Additionally allowing Symbol
increases this number to 2981.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of actual uses in real use cases where String wouldn't make more sense?
No. But it would not just be tests - probably some tutorials also.
For what it's worth. I have no complaints about CategoricalArrays in terms of load/compile times. Other dependencies in MLJ are way worse. Is there a strong use case for optimisation of CAs at this time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not so much load and compilation times than the latency that invalidations generate: they make other (often unrelated) operations slow the first time they are run. This notably affects the REPL's responsiveness.
I wouldn't care too much about this myself TBH. But it has become a recurring theme on Julia forums, to the point that when CategoricalArrays is mentioned people comment that it slows down everything and advise to use something else. So I'd rather concentrate on our core functionality and be fast at it, than be super flexible but with fewer users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and advise to use something else
Oh no!
Thanks for the context.
I would also add |
Also while we are at it it would be great to add |
I thought about it, but I'm hesitant.
Let's handle this separately as it's completely orthogonal AFAICT. |
OK. In DataFrames 0.21 it would throw an error, but fortunately in 0.22 we removed creation of |
@ablaom + @nalimilan - given the findings of @nalimilan that allowing
|
I thought about this but I think it would only lead to unexpected problems. For example, using |
Right - unfortunately. |
Yes that would be the limit of converting automatically symbols to |
Well, I wouldn't trust my code to be correct anymore, so I don't see any advantage of a warning over an informative error. |
OK. I've pushed commits to narrow element type when possible and improve error messages. Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes look good to me.
OK, thanks, let's see how it goes... |
Is now a release going to be tagged and we should update DataFrames.jl dependency? |
Yes. I've bumped the version to 0.9.0 on master, I can tag at any time when DataFrames 0.22 is ready. |
I thin it can be tagged. As we need to make a PR bumping a dependency in DataFrames.jl. This will hopefully make H2O benchmarks a bit better before we make 1.0 release. |
This is sort of a bummer, I was using it with |
Not unless a solution is found to reduce invalidations (or their impact) in Julia, I'm afraid... :-/ Out of curiosity, could you tell me more about your use case? It's always interesting to know what people need. |
Can you explain what the term "invalidation" means here? I'm essentially using it to store values from two different columns in a table, which form a sort of composite primary key for another set of objects. |
See https://julialang.org/blog/2020/08/invalidations/, #177 and discussion above.
OK. Any reason not to use |
Hello, Recently in Julia 1.6 and the most recent version of Categorical Arrays (v0.9.3) some of my code broke because it seems like Also, even when using libraries like GLM.jl or MixedModels.jl, categorical needs to be used and I used to be able to just convert the type to a categorical and proceed. But now there is an extra step of needing to convert with string.() and then converting to categorical which is not as clean as before. Just mentioning these use-cases of categorical in case in the future there is a way to support them. Best, Rohan |
Methods like
convert(::Type{Any}, ::CategoricalValue)
triggers lots of invalidations. Restricting the wrapped types toAbstractString
,AbstractChar
andNumber
alleviates this problem without affecting usability too much.One limit with this approach is that e.g.
convert(Union{String, T}, CategoricalValue{String})
won't work anymore for anyT
, even thoughconvert(String, CategoricalValue{String})
will. Though thanks to special casing, it will work forT <: Missing
andT <: Nothing
.This also means that
CategoricalArray{Any}
is no longer supported. Adapt promotion rules to ensure that mixing e.g. strings and integers givesUnion{String, Integer}
. Drop support fornothing
.Also remove the custom
repr
method which was a legacy of theCategoricalString
era.