Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference performance regression vs 1.1 (MDDatasets.jl) #33336

Closed
KristofferC opened this issue Sep 20, 2019 · 11 comments · Fixed by #33476
Closed

Inference performance regression vs 1.1 (MDDatasets.jl) #33336

KristofferC opened this issue Sep 20, 2019 · 11 comments · Fixed by #33476
Labels
compiler:inference Type inference compiler:latency Compiler latency regression Regression in behavior compared to a previous version

Comments

@KristofferC
Copy link
Member

KristofferC commented Sep 20, 2019

For MDDatasets the following code

using MDDatasets

xingsrise = CrossType(:rise)

#Required mostly to test cross function:
y = [0,0,0,0,1,-3,4,5,6,7,-10,-5,0,0,0,-5,-10,10,-3,1,-4,0,0,0,0,0,0]
d10=DataF1(collect(1:length(y)), y)
@time _x = meas(:xcross, Event, d10, allow=xingsrise)

takes 80 seconds in 1.1 but seems to hang indefinitely on 1.2 and 1.3. Interrupting gives a stack trace deep into inference.

In addition, just loading MDDatasets make the REPL become extremely sluggish for a while on 1.2 and 1.3 (everything getting invalidated?). This is also noticeable on 1.1 but to a much smaller degree.

Here is an example that shows the REPL experience after loading the package (note that I show key pressed on the keyboard). After loading the package and typing 1 it takes about 6 seconds for it to appear in the REPL input field. Pressing enter then takes ~10 seconds for the result to show:

Sep-20-2019 11-34-28

Edit:

1.2 finished:

julia> @time _x = meas(:xcross, Event, d10, allow=xingsrise)
6430.483092 seconds (8.14 G allocations: 511.835 GiB, 18.72% gc time)
DataF1(x=[1, 2, 3],y=[6.428571428571429, 17.5, 19.75])

1.3 finished:

julia> @time _x = meas(:xcross, Event, d10, allow=xingsrise)
	8414.099232 seconds (10.91 G allocations: 684.973 GiB, 4.87% gc time)
DataF1(x=[1, 2, 3],y=[6.428571428571429, 17.5, 19.75])
@KristofferC KristofferC added regression Regression in behavior compared to a previous version compiler:latency Compiler latency labels Sep 20, 2019
@KristofferC KristofferC changed the title Inference performance regression vs 1.1 Inference performance regression vs 1.1 (MDDatasets.jl) Sep 20, 2019
@maleadt
Copy link
Member

maleadt commented Sep 20, 2019

Bisected to 8c44566/#31191 (cc @vtjnash). I had my script timeout at 300s, and right before 8c44566 the script takes 152s vs. about 80 on v1.1.0, so there's another earlier regression, but at least this is the most problematic one. I'll di another bisect overnight.

@maleadt
Copy link
Member

maleadt commented Sep 21, 2019

Further bisect points to #30577 as the cause for the slowdown (cc @JeffBezanson):

commit e456a72b033e3da8ade872a856ad4415a8924663
Author: Jeff Bezanson <jeff.bezanson@gmail.com>
Date:   Sat Feb 2 14:52:32 2019 -0500

    define core NamedTuple constructor without generated functions

@KristofferC
Copy link
Member Author

Further bisect points to #30577 as the cause for the slowdown

You mean the one from 80s -> 152s?

@maleadt
Copy link
Member

maleadt commented Sep 21, 2019

Yes.

@JeffBezanson
Copy link
Member

That's probably because that commit allows us to better infer NamedTuple constructors, so inference is doing more work.

@KristofferC
Copy link
Member Author

I don't know if we should but a 1.3 milestone on this. It is not a regression since 1.2 but if we would have found it in 1.2 we probably would have milestoned it.

@JeffBezanson
Copy link
Member

The operator definitions in MDDatasets (src/datasetop_reg.jl) seem to be causing nearly everything to be invalidated.

@KristofferC
Copy link
Member Author

KristofferC commented Oct 3, 2019

Even with everything invalidated, it takes 2 hours to run now. Should be enough to recompile everything many times over?

@JeffBezanson
Copy link
Member

Yes there are probably multiple issues here.

@JeffBezanson
Copy link
Member

When inferring meas there is a lot of repetitive work; for example here is part of an inference trace piped through sort, uniq -c:

    172 Tuple{typeof(Base.:(>)), MDDatasets.DataF1{TX, TY} where TY<:Number where TX<:Number, Int64}
    119 Tuple{typeof(Base.:(!=)), Int64, MDDatasets.DataF1{TX, TY} where TY<:Number where TX<:Number}
     62 Tuple{typeof(MDDatasets.error_mismatchedsweep), Array{MDDatasets.PSweep{T} where T, 1}, Array{MDDatasets.PSweep{T} where T, 1}}
     62 Tuple{typeof(Base.string), String, Array{MDDatasets.PSweep{T} where T, 1}, String, Array{MDDatasets.PSweep{T} where T, 1}}
     32 Tuple{typeof(MDDatasets._broadcast), Type{T} where T, Array{MDDatasets.PSweep{T} where T, 1}, typeof(Base.:(>)), MDDatasets.DataHR{T1} where T1, Int64}
     32 Tuple{typeof(MDDatasets._broadcast), Type, Array{MDDatasets.PSweep{T} where T, 1}, typeof(Base.:(!=)), Int64, MDDatasets.DataHR{T2} where T2}
     32 Tuple{typeof(MDDatasets.broadcastMDSweep), Array{MDDatasets.PSweep{T} where T, 1}, Int64}
     32 Tuple{typeof(MDDatasets.broadcastMD), MDDatasets.CastType2{Number, 1, Number, 2}, typeof(Base.:(>)), MDDatasets.DataHR{T1}, Int64} where T1
     32 Tuple{typeof(MDDatasets.broadcastMD), MDDatasets.CastType2{Number, 1, Number, 2}, typeof(Base.:(!=)), Int64, MDDatasets.DataHR{T2}} where T2
     32 Tuple{typeof(Base.unsafe_convert), Type{Ptr{T}}, AbstractArray{T, N} where N} where T
     32 Tuple{typeof(Base.unsafe_convert), Type{Ptr{Nothing}}, Union{Array{Int128, 1}, Array{Int16, 1}, Array{Int32, 1}, Array{Int64, 1}, Array{Int8, 1}, Array{UInt128, 1}, Array{UInt16, 1}, Array{UInt32, 1}, Array{UInt64, 1}, Array{UInt8, 1}}}
     32 Tuple{typeof(Base.unsafe_convert), Type{Ptr{_A}} where _A, Union{Array{Int128, 1}, Array{Int16, 1}, Array{Int32, 1}, Array{Int64, 1}, Array{Int8, 1}, Array{UInt128, 1}, Array{UInt16, 1}, Array{UInt32, 1}, Array{UInt64, 1}, Array{UInt8, 1}}}
     32 Tuple{typeof(Base.string), String, Type{#s64} where #s64<:AbstractArray{T, N} where N where T}
     32 Tuple{typeof(Base.show_datatype), Base.GenericIOBuffer{Array{UInt8, 1}}, Type{#s64} where #s64<:AbstractArray{T, N} where N where T}
     32 Tuple{typeof(Base.show), Base.GenericIOBuffer{Array{UInt8, 1}}, Type{#s64} where #s64<:AbstractArray{T, N} where N where T}
     32 Tuple{typeof(Base.print_to_string), String, Type{#s64} where #s64<:AbstractArray{T, N} where N where T}
     32 Tuple{typeof(Base.print), Base.GenericIOBuffer{Array{UInt8, 1}}, Type{#s64} where #s64<:AbstractArray{T, N} where N where T}
     32 Tuple{typeof(Base._memcmp), Union{Array{Int128, 1}, Array{Int16, 1}, Array{Int32, 1}, Array{Int64, 1}, Array{Int8, 1}, Array{UInt128, 1}, Array{UInt16, 1}, Array{UInt32, 1}, Array{UInt64, 1}, Array{UInt8, 1}}, Union{Array{Int128, 1}, Array{Int16, 1}, Array{Int32, 1}, Array{Int64, 1}, Array{Int8, 1}, Array{UInt128, 1}, Array{UInt16, 1}, Array{UInt32, 1}, Array{UInt64, 1}, Array{UInt8, 1}}, Int64}

Maybe there is a large cycle of calls that don't get cached?

@JeffBezanson JeffBezanson added the compiler:inference Type inference label Oct 3, 2019
@JeffBezanson
Copy link
Member

One issue is that MDDatasets defines methods of == and (much worse) != that do not seem to return booleans. We almost always rely on != just calling ==, but the package changes that and also the type. I don't think it's the whole cause of this problem, but it's an abusive definition I happened to notice while investigating.

JeffBezanson added a commit that referenced this issue Oct 5, 2019
Fixes #33336.
This addresses some commonly-occurring cases where having too little
type info makes inference see a lot of recursion in Base that is
not actually possible.
JeffBezanson added a commit that referenced this issue Oct 6, 2019
Fixes #33336.
This addresses some commonly-occurring cases where having too little
type info makes inference see a lot of recursion in Base that is
not actually possible.
JeffBezanson added a commit that referenced this issue Oct 7, 2019
Fixes #33336.
This addresses some commonly-occurring cases where having too little
type info makes inference see a lot of recursion in Base that is
not actually possible.
JeffBezanson added a commit that referenced this issue Oct 7, 2019
…3476)

Fixes #33336.
This addresses some commonly-occurring cases where having too little
type info makes inference see a lot of recursion in Base that is
not actually possible.
KristofferC pushed a commit that referenced this issue Oct 8, 2019
…3476)

Fixes #33336.
This addresses some commonly-occurring cases where having too little
type info makes inference see a lot of recursion in Base that is
not actually possible.

(cherry picked from commit d5d5718)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference compiler:latency Compiler latency regression Regression in behavior compared to a previous version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants