Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/DNM: profile-guided despecialization #2597

Closed
wants to merge 1 commit into from
Closed

RFC/DNM: profile-guided despecialization #2597

wants to merge 1 commit into from

Conversation

timholy
Copy link
Contributor

@timholy timholy commented Jan 9, 2021

DO NOT MERGE

This is an experiment with a new tool I'm trying to develop, which I'm currently calling "profile-guided despecialization." I'm thinking of it as the converse of (or variant of) profile-guided optimization, a technique perhaps most famously used in the JavaScript compiler to dynamically decide which code deserves to be more aggressively compiled. Since Julia aggressively compiles almost everything, in Julia the "tweak" then must be to reduce specialization, and it should be a "static" (source-code level) manipulation since otherwise the specialization would happen before you ever run it.

This reduces specialization of numerous method arguments.
The list was generated by automated tools and may not be
particularly well-chosen, as it depends on the nature of the workload.
This was generated by running the test suite, which is probably
not representative of real-world workloads.

It shaves about 200s off the time to run the tests, out of ~700s total.

I'd be interested in feedback about whether this seems useful. Since this was built by running the test suite, the profile data (aka, runtime performance) was probably not typical of a real workload, so this probably despecializes more than would be ideal. One way to handle that would be to come up with a workload that is more reflective of the real-world tradeoffs between latency and runtime performance and rerun the tools. (I would need help from the DataFrames developers if we go that path.) Another might be to simply truncate the top part of the despecialize.jl file, since the entries are sorted in increasing order of inference cost.

This reduces specialization of numerous method arguments.
The list was generated by automated tools and may not be
particularly well-chosen, as it depends on the nature of the workload.
This was generated by running the test suite, which is probably
not representative of real-world workloads.

The list is sorted from low-to-high inference time, so the ones towards
the end are more important.

This shaves about 200s off the time to run the tests (out of 700s total).
Similar changes to Base & other dependencies would shave off more,
though always with the risk of runtime penalty.
@bkamins
Copy link
Member

bkamins commented Jan 9, 2021

Thank you for looking into this. We have not resolved #2566 yet (other issues got a higher priority unfortunately).

For me to understand when we have:

m.nospecialize |= 59

what does the 59 mean (or any number in general) and why | is used?

Going into more detail (this probably is more informative to @nalimilan):

  • some of the functions clearly do not require specialization (I am going from the bottom): _combine_prepare or combine
  • some of the functions that should be compiled aggressively are at the bottom of the list, e.g. do_call or row_group_slots

@timholy - do I get a correct impression that it is good to disable specialization for functions that we know that they do not do a lot of work and specialization should be performed for the functions that actually do the most of work?

@bkamins bkamins added non-breaking The proposed change is not breaking performance labels Jan 9, 2021
@bkamins bkamins added this to the 1.0 milestone Jan 9, 2021
@bkamins
Copy link
Member

bkamins commented Jan 9, 2021

@timholy - as a second thought. Do you think it is feasible/would make sense to have some global "compilation aggressiveness" switch in DataFrames.jl.

By default it would be set to "non-aggressive" (as - answering your question about typical workflows - typically in practice compilation takes a significant portion of time in comparison to the actual work for most functions in DataFrames.jl).

Then the user could toggle "aggresive" mode if one knows one is going to work with very big data frames?

@timholy
Copy link
Contributor Author

timholy commented Jan 9, 2021

Good questions & ideas!

what does the 59

m.nospecialize is a bitfield, so m.nospecialize & (1 << i) indicates whether the ith argument should be specialized. I should say that this is me pulling dirty tricks, really that field is for the compiler, not for users. Ultimately the better option is to @nospecialize the method definition, but I don't yet have automated tools to do that and some methods (kw and bodyfunction methods) don't have a way to independently control specialization of their arguments.

why | is used

So I don't overwrite any user @nospecialize statements.

do I get a correct impression that it is good to disable specialization for functions that we know that they do not do a lot of work and specialization should be performed for the functions that actually do the most of work?

Absolutely

should be compiled aggressively are at the bottom of the list, e.g. do_call or row_group_slots

Yes, the test suite probably doesn't have enough demand on runtime to be very accurate. The sorting just means the total inference time. There is a threshold in my code that omits methods for which the runtime is at least 10% of the inference time, but if they're only processing small Dataframes in the tests then the runtime may be smaller than that.

Do you think it is feasible/would make sense to have some global "compilation aggressiveness" switch in DataFrames.jl.

Yes, this could be a template. You could have two different settings for m.nospecialize for a set of methods, and toggle between them.

Keep in mind that this won't help much for things that only need to be compiled once, unless that once fits a precompile directive. The main savings on the test suite comes from the fact that these get used for many different types. Another reason it might make sense to try this (if you're interested) that reflects the actual kind of workload you'd like to optimize for your users.

However, by reducing the specialization, you also make a single precompile directive cover a wider range of types, so you might still get a win.

@bkamins
Copy link
Member

bkamins commented Jan 9, 2021

The main savings on the test suite comes from the fact that these get used for many different types.

The most problematic case in DataFrames.jl is that users very often will use anonymous functions in top-level code. This is the use case that causes most problems (and hopefully can be helped with your changes).

Here is a simple example on 0.22.2:

julia> using DataFrames

julia> df = DataFrame(rand(10^6, 2), :auto);

julia> @time filter(:x1 => x -> x < 0.5 , df); # bad case
  0.366897 seconds (773.60 k allocations: 50.136 MiB, 8.56% gc time)

julia> @time filter(:x1 => x -> x < 0.5 , df); # bad case
  0.153661 seconds (382.27 k allocations: 29.378 MiB)

julia> @time filter(:x1 => x -> x < 0.5 , df); # bad case
  0.157016 seconds (382.26 k allocations: 29.380 MiB)

julia> @time filter(:x1 => <(0.5) , df); # good case
  0.146121 seconds (386.93 k allocations: 29.659 MiB, 5.59% gc time)

julia> @time filter(:x1 => <(0.5) , df); # good case
  0.008190 seconds (28 allocations: 11.571 MiB)

julia> @time filter(:x1 => <(0.5) , df); # good case
  0.006689 seconds (28 allocations: 11.571 MiB)

the issue is that <(0.5) is a special case, and in general in many core functions like filter, select, etc. we will have users passing anonymous functions in global scope (as DataFrames.jl is heavily used in interactive mode).

@timholy
Copy link
Contributor Author

timholy commented Jan 9, 2021

diff --git a/src/abstractdataframe/abstractdataframe.jl b/src/abstractdataframe/abstractdataframe.jl
index 75703ca8..9ed1c227 100644
--- a/src/abstractdataframe/abstractdataframe.jl
+++ b/src/abstractdataframe/abstractdataframe.jl
@@ -991,12 +991,13 @@ julia> filter(AsTable(:) => nt -> nt.x == 1 || nt.y == "b", df)
    3 │     1  b
 """
-@inline function Base.filter(f, df::AbstractDataFrame; view::Bool=false)
+@inline function Base.filter(@nospecialize(f), df::AbstractDataFrame; @nospecialize(view::Bool=false))
     rowidxs = _filter_helper(f, eachrow(df))
     return view ? Base.view(df, rowidxs, :) : df[rowidxs, :]
 end
 
-@inline function Base.filter((cols, f)::Pair, df::AbstractDataFrame; view::Bool=false)
+@inline function Base.filter(@nospecialize(colf::Pair), df::AbstractDataFrame; @nospecialize(view::Bool=false))
+    cols, f = colf
     int_cols = index(df)[cols] # it will be AbstractVector{Int} or Int
     if length(int_cols) == 0
         rowidxs = [f() for _ in axes(df, 1)]

reduces the cost by about 20ms for me, but the real costs are in Base.Broadcast. I might take a quick peek and see if I can improve that, but in the meantime is this kind of change something that's viable for you?

The @nospecialize(view::Bool=false) may seem silly, but it seems to prevent recompilation triggered by constant-propagation.

@bkamins
Copy link
Member

bkamins commented Jan 9, 2021

The @nospecialize(view::Bool=false) may seem silly, but it seems to prevent recompilation triggered by constant-propagation.

😮 - I would have never thought about it.

I understand that the downside would be that the call to filter will not be type-stable with constant propagation - right? (this is what @nalimilan wanted, but I think I would prefer to cut down compilation time rather than have type-stable return type from filter)

@timholy
Copy link
Contributor Author

timholy commented Jan 9, 2021

@nospecialize is a bit funny: first, it's a "hint" not a command, and second it technically prevents codegen rather than inference. However, when called from a non-inferrable location it does prevent specialization by runtime dispatch.

The net effect is quite subtle, and I still don't always predict correctly how it will work out. Here's an example:

ulia> f(@nospecialize(::Integer)) = 1
f (generic function with 1 method)

julia> f(@nospecialize(::AbstractFloat)) = 2
f (generic function with 2 methods)

julia> g(c) = map(f, c)
g (generic function with 1 method)

julia> g([1,2])
2-element Vector{Int64}:
 1
 1

julia> using MethodAnalysis

julia> methodinstances(f)       # surprise!
1-element Vector{Core.MethodInstance}:
 MethodInstance for f(::Int64)

julia> g(AbstractFloat[1,2])
2-element Vector{Int64}:
 2
 2

julia> methodinstances(f)       # probably more along the lines of what you expected
2-element Vector{Core.MethodInstance}:
 MethodInstance for f(::Int64)
 MethodInstance for f(::AbstractFloat)

julia> Base.return_types(g, (Vector{Int},))
1-element Vector{Any}:
 Vector{Int64} (alias for Array{Int64, 1})

julia> Base.return_types(g, (Vector{AbstractFloat},))
1-element Vector{Any}:
 Vector{Int64} (alias for Array{Int64, 1})

The last one comes back as Vector{Int} because all applicable callees of f(::AbstractFloat) return an Int. Were that not the case, or were there more than 3 of them, then inference would indeed have failed in the latter but not the former case.

@bkamins
Copy link
Member

bkamins commented Jan 9, 2021

And regarding broadcasting - I have noticed that moving from broadcasting to using a comprehension does not hurt the performance and often improves compilation - is this something that can be expected (we could re-work the internals to use broadcasting less then). Thank you!

@timholy
Copy link
Contributor Author

timholy commented Jan 9, 2021

Yes, broadcasting codegen is pretty expensive. Currently that filter operation returns a BitArray, which you can't get from a comprehension, but if you don't care about that then the comprehension should be faster to reinfer.

@bkamins
Copy link
Member

bkamins commented Mar 28, 2021

Ultimately the better option is to @nospecialize the method definition, but I don't yet have automated tools to do that and some methods (kw and bodyfunction methods) don't have a way to independently control specialization of their arguments.

@timholy - we are mostly done with the functionality of DataFrames.jl for 1.0 release and I am going now to work on despecialization + splitting methods to reduce compile cost.

I would appreciate if you commented on the following issues:

  • we mostly have functions that are either cheap or expensive; by "cheap" I mean that they do very little work and are not likely to be called in hot-loops etc. (they might call expensive functions, but this is OK if I understand things correctly); is this effectively equivalent to what you propose to wrap cheap functions in @nospecialize before and @specialize after the function? (so to turn off specialization for all their arguments?); can there be benefits of doing @nospecialize on only selected arguments in such cases?
  • if we wrap function in @nospecialize + @specialize block is then there any benefit of doing method splitting for it (as discussed in Split methods to reduce compile cost of argument diversity #2563) - again I am talking about cheap functions here (we have many functions that have quite complex logic, but effectively do little work and delegate the compute intensive parts to other functions that would be specialized). Or it still will be beneficial to split such a function in some cases.
  • how all these mechanisms play with inlining? I.e. when I split-out the part of the function that performs some compute intensive task from a bigger function that is wrapped in @nospecialize do I have to annotate this subroutine with @noinline to make sure that the compiler does not inline it (effectively causing it to be not-specialized)?

Thank you!

@bkamins
Copy link
Member

bkamins commented Mar 30, 2021

@timholy In #2691 I came to the conclusion that instead of despecialization it is better to pass values I do not want inferred wrapped in Ref{Any} instead. Do you see any downsides of such an approach?

@bkamins
Copy link
Member

bkamins commented Apr 11, 2021

@timholy - thank you for raising this issue. We are now manually despecializing either with @nospecialize or Ref{Any} following your suggestions, so I am closing this PR.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-breaking The proposed change is not breaking performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants