Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

timholy · 2020-06-23T11:58:41Z

Short version: as protection against invalidation (and possibly "code quality" in general), perhaps we should consider some kind of nanosoldier test on the count of backedges of poorly-inferred MethodInstances. Also an update on progress (& regressions) from 1.4 to master.

There are two things that bother me about my recent work on improving inferrability of Base, Pkg, REPL, etc.:

it's all package-specific: I care about PkgX, so I search for and stomp on invalidations that it triggers, but that may or may not help users of PkgY. Worse, is there any risk it could make things worse for PkgY?
it's easy to imagine regressions. The fixes I've been making only rarely have a consequence that can be caught by @inferred, as many methods return a known type even if their internals are not well-inferred.

For these reasons, I've begun to wonder whether we can & should add some tests that look for "problematic" MethodInstances and tally their number of backedges. The goal would be to develop an absolute measure of our vulnerability to invalidation: at-risk MethodInstances that have the most backedges can cause the most invalidation. If we can reliably measure our vulnerability, then we could also introduce some "hold the line" tests, which prevent our vulnerability from increasing above a certain level. I recognize that there are serious downsides in adding such tests---someone wants to add cool feature X but it's blocked because it's not easy to do that inferrably---but there are also quality-concern risks with not having such tests. So, I thought I'd begin by at least trying to come up with some kind of overall measure of our vulnerability.

At the bottom of this I present a couple of scripts that seem useful. What they try to do is pull out MethodInstances that seem possibly problematic, in the sense that they fit patterns that I've seen while studying invalidations. Overwhelmingly, they involve abstract types, but many methods with abstract types seem reasonably OK, so I've tried to accommodate that in some specific ways. I should say right from the outset that I'm not at all confident that I've made the selection well, but at least it does pull out a lot of MethodInstances that have featured heavily in my invalidation work (as well as many that haven't). For more detail you should really just open up the code below.

The output of the analysis is a scatter plot, comparing where we were on Julia 1.4.2 vs where we are now on master. Each dot represents one possibly-problematic MethodInstance signature. Magenta is for exported names, green for non-exported names:

You can see that overall there seems to have been some improvement, at least if you restrict yourself to exported names: there's a strong tendency for magenta dots to be near or below the diagonal. Until I saw this, I had no idea how far we were towards "done," or even whether it's possible to ask that question in a reasonable way. While I've put a lot of time into hunting these invalidations down, overall this result makes me think that efforts to improve our code-quality are worth it, and that we're mostly not living in a world where #35844 is our only hope (CC @c42f). Time will tell...

It's worth noting that the scripts below generate an interactive plot, where you can click on individual dots and it will print the corresponding signature to the REPL. So running it locally can be an informative exercise. How informative it actually is comes down to how well I've selected "problematic" MethodInstances, of course, so take all this with a big grain of salt. (Quite a few of the dots on the plot are nothing to worry about, e.g., setindex!(::Vector{Any}, ::Any, ::Int) is perfectly fine. Others, like notify(::Base.GenericCondition{Base.Threads.SpinLock},::Any,::Bool,::Bool) seem to be at low risk for being extended by packages.)

Some cases where we've made a lot of progress are in various ==, !=, and isequal methods, a small handful of convert methods, some obscure yet surprisingly-important methods like +(::Ptr{Nothing},::Integer), and several string-generation and manipuluation methods (e.g., repr, thisind, lastindex, isempty(::AbstractString), getindex(::String, ::Any), ^(::String, ::Integer), Base.isidentifier, and Base._show_default). Interestingly, there are also a handful of significant regressions since 1.4, mostly in path-handling (we've had a huge increase in backedges for basename(::AbstractString), isfile(::Any)) but also a small number of others like in(::Any, ::Tuple{Symbol}), +(::Int, ::Any, ::Any), and Base.split_sign(::Integer). It is possible that some of these are consequences of inference fixes (e.g., basename(::AbstractString) seems better than basename(::Any)), but these would be worth digging into.

Based on this result I'm beginning to think that it probably would be worth introducing some "hold the line" tests in Base. But I'd be very curious about other folks' thoughts on this matter.

This script saves the data to a log file. Run once on each Julia version you want to analyze:

using MethodAnalysis

# Analyze MethodInstance signatures and select those that seem at risk for being invalidated.
function atrisktype(@nospecialize(typ))
    # signatures like `convert(Vector, a)`, `foo(::Vararg{Synbol,N}) where N` do not seem to pose a problem
    isa(typ, TypeVar) && return false
    # isbits parameters are not a problem
    isa(typ, Type) || return false
    if isa(typ, UnionAll)
        typ = Base.unwrap_unionall(typ)
    end
    # Exclude signatures with Union{}
    typ === Union{} && return false
    isa(typ, Union) && return atrisktype(typ.a) | atrisktype(typ.b)
    # Type{T}: signatures like `convert(::Type{AbstractString}, ::String)` are not problematic, so mark Type as OK
    typ <: Type && return false
    if typ <: Tuple && length(typ.parameters) >= 1
        p1 = typ.parameters[1]
        # Constructor calls are not themselves a problem (any `convert`s they trigger might be, but those are covered)
        isa(p1, Type) && p1 <: Type && return false
        # convert(::Type{T}, ::S) where S<:T is not problematic
        if p1 === typeof(Base.convert) || p1 === typeof(Core.convert) || p1 === typeof(Core.Compiler.convert)
            p2, p3 = typ.parameters[2], typ.parameters[3]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                if isa(p2, DataType) && length(p2.parameters) === 1
                    T = p2.parameters[1]
                    isa(p3, Type) && isa(T, Type) && p3 <: T && return false
                end
            end
        # `getindex`, `length`, etc are OK for various Tuple{T1,T2,...}
        elseif (p1 === typeof(Base.getindex) || p1 === typeof(Core.Compiler.getindex)) ||
               (p1 === typeof(Base.length)  || p1 === typeof(Core.Compiler.length)) ||
               (p1 === typeof(Base.isempty) || p1 === typeof(Core.Compiler.isempty)) ||
               (p1 === typeof(Base.iterate) || p1 === typeof(Core.iterate) || p1 === typeof(Core.Compiler.iterate))
            p2 = typ.parameters[2]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                p2 <: Tuple && return false
            end
        # show(io::IO, x) is OK as long as typeof(x) is safe
        elseif p1 === typeof(Base.show)
            atrisktype(typ.parameters[2]) && return true
            length(typ.parameters) == 3 && atrisktype(typ.parameters[3]) && return true
            return false
        end
    end
    # Standard DataTypes
    isconcretetype(typ) && return false
    # ::Function args are excluded
    typ === Function && return false
    !isempty(typ.parameters) && (any(atrisktype, typ.parameters) || return false)
    return true
end

# A few tests
@assert  atrisktype(Tuple{typeof(==),Any,Any})
@assert  atrisktype(Tuple{typeof(==),Symbol,Any})
@assert  atrisktype(Tuple{typeof(==),Any,Symbol})
@assert !atrisktype(Tuple{typeof(==),Symbol,Symbol})
@assert !atrisktype(Tuple{typeof(convert),Type{Any},Any})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},AbstractString})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},String})
@assert  atrisktype(Tuple{typeof(convert),Type{String},AbstractString})
@assert !atrisktype(Tuple{typeof(map),Function,Vector{Any}})
@assert !atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Union{String,Int}})
@assert  atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Any})
@assert !atrisktype(Tuple{Type{BoundsError},Any,Any})
@assert  atrisktype(Tuple{typeof(sin),Any})
@assert !atrisktype(Tuple{typeof(length),Tuple{Any,Any}})

function collect_mis(m::Method)
    list = Core.MethodInstance[]
    visit(m) do item
        if isa(item, Core.MethodInstance)
            push!(list, item)
            return false
        end
        return true
    end
    return list
end

const mis = Dict{Method,Vector{Core.MethodInstance}}()
visit() do item
    if item isa Method
        m = item
        mis[m] = collect_mis(m)
        return false
    end
    return true
end

# Count # of backedges for MethodInstances with abstract types
const becounter = Dict{Core.MethodInstance,Int}()
visit() do item
    if item isa Core.MethodInstance
        if atrisktype(item.specTypes)
            becounter[item] = length(all_backedges(item))
        end
        return false
    end
    return true
end

prs = sort!(collect(becounter); by=last)

open("/tmp/methdata_$VERSION.log", "w") do io
    for (mi, c) in prs
        c == 0 && continue
        println(io, mi.specTypes=>c)
    end
end

This script generates the interactive plot. You'll probably have to modify some file names.

using PyPlot

function parseline(line)
    m = match(r"(.*) => (.*)", line)
    sigstr, count = m.captures[1], parse(Int, m.captures[2])
    sig = try
        ex = Meta.parse(sigstr)
        eval(ex)
    catch
        return nothing
    end
    return sig, count
end

function parsedata(filename)
    lines = readlines(filename)
    sigcount = IdDict{Any,Int}()
    for line in lines
        ret = parseline(line)
        ret === nothing && continue
        sig, count = ret
        sigcount[sig] = count
    end
    return sigcount
end

function split_comparable(sigc1, sigc2)
    c1, c2, sigs = Int[], Int[], Any[]
    for (sig, c) in sigc1
        push!(sigs, sig)
        push!(c1, sigc1[sig])
        push!(c2, get(sigc2, sig, 0))
    end
    for (sig, c) in sigc2
        if !haskey(sigc1, sig)
            push!(sigs, sig)
            push!(c1, 0)
            push!(c2, c)
        end
    end
    return sigs, c1, c2
end

sigc16 = parsedata("/tmp/methdata_$VERSION.log")
sigc14 = parsedata("/tmp/methdata_1.4.3-pre.0.log")

sigs, c1, c2 = split_comparable(sigc14, sigc16)
mx1, mx2 = maximum(c1), maximum(c2)
isexported(sig) = (ft = Base.unwrap_unionall(sig).parameters[1]; isdefined(Main, ft.name.mt.name))
colors = [isexported(sig) ? "magenta" : "green" for sig in sigs]

function on_click(event)
    x, y = event.xdata, event.ydata
    normsqrdist(pr) = ((pr[1]-x)/mx1)^2 + ((pr[2]-y)/mx2)^2
    idx = argmin(normsqrdist.(zip(c1, c2)))
    println(sigs[idx])
end
begin
    fig, ax = plt.subplots()
    ax.scatter(c1 .+ 1, c2 .+ 1, c=colors)  # + 1 for the log-scaling
    ax.set_xlabel("# backedges + 1, 1.4")
    ax.set_ylabel("# backedges + 1, 1.6")
    ax.set_xscale("log")
    ax.set_yscale("log")
    fig.canvas.callbacks.connect("button_press_event", on_click)
    fig
end

Note: in the plotting script, if you copy/paste the begin...end block to the REPL and execute it, you can use the zoom features of matplotlib. For reasons I don't understand, they don't show up when you just include the script.

The text was updated successfully, but these errors were encountered:

timholy · 2020-06-30T20:58:33Z

Here's a version of the plot above that combines all MethodInstances for a single Method:

It's a bit less cluttered and likely closer to something we'd want to test: shifting something from ==(::String, ::Any) to ==(::String, ::AbstractString) is still probably worth considering as progress, or neutral at worst, and by aggregating all instances this kind of change would be considered neutral.

I hadn't commented on the ones at the left edge of the plot, which may be particularly concerning. Omitting ones in Core.Compiler, the Method signatures with 10 or more backedges are:

Tuple{typeof(startswith),AbstractString,Union{AbstractChar, Tuple{Vararg{AbstractChar,N} where N}, Set{var"#s69"} where var"#s69"<:AbstractChar, AbstractArray{var"#s70",1} where var"#s70"<:AbstractChar}}
Tuple{typeof(sort!),AbstractArray{T,1} where T,Integer,Integer,Base.Sort.InsertionSortAlg,Base.Order.Ordering}
Tuple{Base.var"##pipeline#593",Any,Any,Any,Bool,typeof(pipeline),Base.AbstractCmd}
Tuple{typeof(Base.StringVector),Integer}
Tuple{typeof(Base.cconvert),Type{var"#s4"} where var"#s4"<:Ptr,Any}
Tuple{typeof(sort!),AbstractArray{T,1} where T,Integer,Integer,Base.Sort.MergeSortAlg,Base.Order.Ordering}
Tuple{typeof(ismutable),Any}
Tuple{Base.var"##string#334",Integer,Integer,typeof(string),Integer}
Tuple{typeof(Base.Meta.isexpr),Any,Any}
Tuple{typeof(Base.Meta.isexpr),Any,Symbol,Int64}
Tuple{typeof(getproperty),Base.AbstractPipe,Symbol}

Other than cconvert and maybe startswith, none of these seem likely to be specialized for anything that might overlap those signatures. Conversely, the ones along the bottom edge (which have all poorly-inferred specializations eliminated) are:

Tuple{typeof(similar),AbstractArray,Type{T},Vararg{Union{Integer, AbstractUnitRange},N} where N} where T
Tuple{typeof(similar),Type{T},Vararg{Union{Integer, AbstractUnitRange},N} where N} where T<:AbstractArray
Tuple{typeof(+),Ptr,Integer}
Tuple{typeof(isempty),AbstractString}
Tuple{typeof(lastindex),AbstractString}
Tuple{typeof(pointer_from_objref),Any}
Tuple{typeof(convert),Type{SubString{S}},AbstractString} where S<:AbstractString
Tuple{typeof(get!),Dict{K,V},Any,Any} where V where K
Tuple{typeof(get!),Union{Function, Type},Dict{K,V},Any} where V where K
Tuple{typeof(merge),NamedTuple,Any}
Tuple{typeof(Base.ensure_indexable),Tuple{Any,Vararg{Any,N} where N}}
Tuple{typeof(to_indices),Any,Tuple}
Tuple{typeof(iterate),Dict,Any}
Tuple{typeof(length),Dict}

and several others. A lot of these are more likely to be specialized.

Overall this paints a picture of considerable progress, a few regressions, and lots still to address.

These are the scripts used for JuliaLang/julia#36393

timholy · 2020-08-24T11:41:40Z

Time for an update! In the data below, by 1.6 I'm including all my still-unmerged PRs, but if most of that gets merged then this will be closer to the real truth anyway...

First, the scatter plot against 1.5:

Very few are above the diagonal; here's a list of some I clicked on and an assessment of what would cause them to be invalidated:

Signature	Potential invalidation trigger	risk
`Tuple{typeof(getproperty),Base.LibuvStream,Symbol}`	new `getproperty` methods for `LibuvStream` subtypes	low (this would essentially be piracy)
`Tuple{typeof(isequal),Char,AbstractChar}`	new `isequal` methods for new subtypes of `AbstractChar`	moderate
`Tuple{Base.var"##pipeline#585",Nothing,Pipe,Nothing,Bool,typeof(pipeline),Base.AbstractCmd}`	new `pipeline` methods for new `AbstractCmd`s	low
`Tuple{typeof(readuntil),Base.AbstractPipe,Char}`	new `readuntil` methods for (new) `AbstractPipe` subtypes	low
`Tuple{typeof(isopen),Base.LibuvStream}`	new `isopen` methods for (new) `LibuvStream` subtypes	low
`Tuple{typeof(Base.CoreLogging.env_override_minlevel),Symbol,Any}`	meths for new logger types	low to moderate (this is not typically specialized or part of the documented extension API)
`Tuple{typeof(haskey),Dict{Char,Any},Any}`	new `haskey` specializations for `Dict`	low (this would be piracy)
`Tuple{typeof(replace),AbstractString,Pair{Regex,String}}`	new `replace` specializations for new strings	high
`Tuple{typeof(print),IO,Any,Char}`		low (people should specialize `show`, not `print`)

But what may tell the story even more directly is a comparison of histograms. This tallies the number of backedges for each MethodInstance with an "at risk" signature, meaning one that my analysis suggests might be vulnerable to invalidation:

This clearly reveals a lot of progress---problematic signatures on 1.6 are likely to have no backedges (the signatures, it should be said, are merged from 1.5 and 1.6, so the ones in the "zero bin" are probably not even MethodInstances on 1.6). Here are the most problematic remaining cases:

                                                                                                            ⋮
                                                         MethodInstance for string(::String, ::Any, ::String) => (nmethods = 19, nbackedges = 275)
                                                  MethodInstance for string(::String, ::Any, ::String, ::Any) => (nmethods = 19, nbackedges = 278)
                                                    MethodInstance for iterate(::Base.Iterators.Pairs, ::Any) => (nmethods = 223, nbackedges = 304)
                       MethodInstance for string(::String, ::Type{Pkg.TOML.Table}, ::String, ::Any, ::String) => (nmethods = 19, nbackedges = 317)
                                         MethodInstance for string(::String, ::Type, ::Vararg{Any,N} where N) => (nmethods = 19, nbackedges = 323)
                                           MethodInstance for setindex!(::IdDict{Symbol,Int64}, ::Any, ::Any) => (nmethods = 89, nbackedges = 363)
                                   MethodInstance for abspath(::AbstractString, ::String, ::String, ::String) => (nmethods = 3, nbackedges = 366)
                                  MethodInstance for joinpath(::AbstractString, ::String, ::String, ::String) => (nmethods = 2, nbackedges = 367)
                                                MethodInstance for convert(::Type{T} where T<:Tuple, ::Tuple) => (nmethods = 184, nbackedges = 409)
                                                          MethodInstance for get(::IOBuffer, ::Symbol, ::Any) => (nmethods = 25, nbackedges = 417)
                                                 MethodInstance for convert(::Type{String}, ::AbstractString) => (nmethods = 184, nbackedges = 426)
                                                                 MethodInstance for isempty(::AbstractString) => (nmethods = 41, nbackedges = 530)
                                                             MethodInstance for *(::AbstractString, ::String) => (nmethods = 369, nbackedges = 545)
                                                        MethodInstance for string(::AbstractString, ::String) => (nmethods = 19, nbackedges = 550)
                                                               MethodInstance for lastindex(::AbstractString) => (nmethods = 13, nbackedges = 596)
                                                        MethodInstance for thisind(::AbstractString, ::Int64) => (nmethods = 4, nbackedges = 599)
                                                        MethodInstance for rem(::AbstractChar, ::Type{UInt8}) => (nmethods = 145, nbackedges = 772)
                                                           MethodInstance for isequal(::Char, ::AbstractChar) => (nmethods = 22, nbackedges = 773)
                                                                MethodInstance for <=(::AbstractChar, ::Char) => (nmethods = 56, nbackedges = 826)
                                                                 MethodInstance for <(::AbstractChar, ::Char) => (nmethods = 75, nbackedges = 827)
                                                            MethodInstance for isless(::AbstractChar, ::Char) => (nmethods = 41, nbackedges = 828)
                                                                MethodInstance for ==(::Char, ::AbstractChar) => (nmethods = 157, nbackedges = 830)
                                                                MethodInstance for ==(::AbstractChar, ::Char) => (nmethods = 157, nbackedges = 845)
             MethodInstance for notify(::Base.GenericCondition{Base.Threads.SpinLock}, ::Any, ::Bool, ::Bool) => (nmethods = 4, nbackedges = 1572)

It's striking how much of this is string-processing, an area I've not really focused on because I don't tend to load a lot of string-focused packages. While I've made a few contributions to this area already, on balance I suspect this will be a good area for others to contribute to reducing our vulnerability.

As fair warning: once my remaining PRs are merged, I plan to contribute a test that is along the lines of capturing the progress in that histogram so that this progress gets "locked in."

Closes #36393

timholy added compiler:inference Type inference compiler:latency Compiler latency labels Jun 23, 2020

This was referenced Jun 28, 2020

Julep: tfuncs for all #36463

Closed

More improvements to inference in IO #36449

Merged

timholy added a commit to timholy/MethodAnalysis.jl that referenced this issue Jun 30, 2020

Add a demo measuring all poorly-inferred backedges

adaaafc

These are the scripts used for JuliaLang/julia#36393

This was referenced Jun 30, 2020

More demos timholy/MethodAnalysis.jl#8

Merged

Fix invalidations from loading OrderedCollections JuliaLang/Pkg.jl#1897

Merged

timholy mentioned this issue Aug 24, 2020

Fix the demos on abstract signatures timholy/MethodAnalysis.jl#16

Merged

timholy added a commit that referenced this issue Aug 25, 2020

Add a generic code-inferrability test

26a03a2

Closes #36393

This was referenced Aug 25, 2020

Add a generic code-inferrability test #37193

Merged

More precise type-specification? JuliaLang/TOML.jl#7

Closed

timholy added a commit that referenced this issue Nov 6, 2020

Add a generic code-inferrability test

92ebc59

Closes #36393

timholy added a commit that referenced this issue Nov 8, 2020

Add a generic code-inferrability test

4a29654

Closes #36393

timholy added a commit that referenced this issue Nov 8, 2020

Add a generic code-inferrability test

57f97d4

Closes #36393

timholy added a commit that referenced this issue Nov 8, 2020

Add a generic code-inferrability test

f27e275

Closes #36393

timholy added a commit that referenced this issue Nov 8, 2020

Add a generic code-inferrability test

c8127ea

Closes #36393

timholy added a commit that referenced this issue Nov 9, 2020

Add a generic code-inferrability test

2a2cfdc

Closes #36393

timholy added a commit that referenced this issue Nov 9, 2020

Add a generic code-inferrability test

fee3f0d

Closes #36393

timholy closed this as completed in #37193 Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

timholy commented Jun 23, 2020 •

edited

Loading

timholy commented Jun 30, 2020 •

edited

Loading

timholy commented Aug 24, 2020 •

edited

Loading

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

Comments

timholy commented Jun 23, 2020 • edited Loading

timholy commented Jun 30, 2020 • edited Loading

timholy commented Aug 24, 2020 • edited Loading

timholy commented Jun 23, 2020 •

edited

Loading

timholy commented Jun 30, 2020 •

edited

Loading

timholy commented Aug 24, 2020 •

edited

Loading