Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base.precompile is highly effective - can it be auto-inserted? Documented? #12897

Closed
IainNZ opened this issue Sep 1, 2015 · 14 comments
Closed
Labels
compiler:precompilation Precompilation of modules docs This change adds or pertains to documentation performance Must go faster

Comments

@IainNZ
Copy link
Member

IainNZ commented Sep 1, 2015

This is a very contrived reduced example to demonstrate my point, but I see the same effect in real JuMP and JuMPeR code.

Basically, Base.precompile is capable of slashing first-run times for functions by factors of 2 to 4 in my tests. For users at the REPL, or even just running small scripts, this has greatly improved the user experience. Getting a simple example is tricky, but considering the following code. Right at the bottom is a commented out call to Base.precompile.

Here is my test script:

tic()
using PrecompTest
toc()
@time PrecompTest.complicated(10,1.0,2.0,3.0,4.0,5.0)
@time PrecompTest.complicated(10,1.0,2.0,3.0,4.0,5.0)
@time PrecompTest.complicated(10,1.0,2.0,3.0,4.0,5.0)

Without Base.precompile

elapsed time: 0.27908636 seconds
  0.168520 seconds (337.81 k allocations: 15.117 MB, 11.94% gc time)
  0.000021 seconds (5 allocations: 176 bytes)
  0.000021 seconds (5 allocations: 176 bytes)

With Base.precompile

elapsed time: 0.267551923 seconds
  0.054533 seconds (6.53 k allocations: 331.406 KB)
  0.000023 seconds (5 allocations: 176 bytes)
  0.000025 seconds (5 allocations: 176 bytes)

With is 3x speed up for the first run.

In this case, because all the arguments are specified, it would actually be possible to auto-generate that Base.precompile call. My first question/idea would be to auto-generate that call if the package is precompiled. Second, is documenting this (and the other great compile speed up trick, avoiding kwargs) desirable? I could see an argument for it not being a great idea to document due to how deep it gets into the internals and is something that can and hopefully will change over time.

Here is the (contrived example) test module, PrecompTest.jl:

__precompile__()

module PrecompTest

    function complicated(N::Int, foo::Float64, bar::Float64, fizz::Float64, buzz::Float64, yo::Float64)
        data_so_far = foo + bar + fizz + buzz + yo
        for i in 1:N
            data_so_far += sqrt(cos(sin(data_so_far))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        temp1 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp1 += sqrt(cos(sin(temp1))^2)
        end
        temp2 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp2 += sqrt(cos(sin(temp2))^2)
        end
        temp3 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp3 += sqrt(cos(sin(temp3))^2)
        end
        temp4 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp4 += sqrt(cos(sin(temp4))^2)
        end
        temp5 = foo + bar + fizz + buzz + yo
        for i in 1:N
            temp5 += sqrt(cos(sin(temp5))^2)
        end
        return data_so_far+temp1*temp2+temp3*temp4+temp5
    end
    #Base.precompile(complicated, (Int, Float64, Float64, Float64, Float64, Float64))
end
@IainNZ IainNZ added performance Must go faster compiler:precompilation Precompilation of modules labels Sep 1, 2015
@stevengj
Copy link
Member

stevengj commented Sep 1, 2015

It's documented at http://docs.julialang.org/en/latest/stdlib/base/?highlight=precompile#Base.precompile but I guess you mean in the manual section on modules and precompilation.

@stevengj stevengj added the docs This change adds or pertains to documentation label Sep 1, 2015
@stevengj
Copy link
Member

stevengj commented Sep 1, 2015

I have to say that it worries me to auto-generate precompile(foo, ...) if all the arguments of foo were specified as concrete types, which is what you seem to be suggesting.

The problem is that it pushes people to declare all arguments as concrete types to improve (first-time) performance, which is precisely the opposite of what they should be doing to have generic, Julian code.

If this is a common issue that isn't solved by better documentation, maybe modules could push an atexit function that writes to a database tracking what method signatures were actually compiled, and this could be used as a hint for future precompilations? But it seems better to wait until we have more experience with precompilation in the wild before doing anything radical.

@vtjnash
Copy link
Member

vtjnash commented Sep 1, 2015

just adding the --compile=all flag may help here, although there may still be work to be done to ensure that it is hooked up correctly for the incremental compile option.

@IainNZ
Copy link
Member Author

IainNZ commented Sep 1, 2015

Is there a way to get all the method signatures that were compiled?

@vtjnash
Copy link
Member

vtjnash commented Sep 1, 2015

i tried making a demo of something similar a while back; i can't say if it even works anymore: 4802dcd

@alyst
Copy link
Contributor

alyst commented Sep 1, 2015

Is it possible to precompile the functions encountered in test/example folders of the package?

@timholy
Copy link
Member

timholy commented Sep 1, 2015

It's interesting how many people started thinking about this all at once: I was about halfway towards a solution when I read this. I developed an approach that gets the compiler to dump the functions and types it's compiling, together with a measure of how long it spent compiling each one. See https://github.com/timholy/SnoopCompile.jl, and pull requests at GiovineItalia/Gadfly.jl#673 and GiovineItalia/Compose.jl#143.

@vtjnash, probably the part I'd like the most feedback on is this bit. I'm not quite sure how to handle precompiles like this:

precompile(Base.foo, (Array{Compose.Circle,1},))

because Base doesn't know anything about Compose.Circle. Does each compiled function have to be stored in its parent module? Or can the Gadfly module store a compiled version of a function that lives in Base but which ran on types not defined in Base?

Incidentally, of the ~2000 functions compiled by running https://github.com/timholy/SnoopCompile.jl/blob/03543b7124cc121a2e03e4f105eac73d74a9596a/examples/gadfly.jl#L14-L17, approximately half of them were in Base. So the cross-module thing is a pretty big deal. Those two PRs to Gadfly and Compose only included functions defined in those modules.

@IainNZ
Copy link
Member Author

IainNZ commented Sep 1, 2015

Hah wow, we really are thinking about the same things! SnoopCompile looks fun, going to play with that.

@vtjnash
Copy link
Member

vtjnash commented Sep 1, 2015

I didn't implement any cross-module caching for fear that it would slow down loading.

@timholy
Copy link
Member

timholy commented Sep 2, 2015

Fine. So it's fair to say that, for now, this is basically the best we can do?

@timholy
Copy link
Member

timholy commented Sep 2, 2015

For those interested, I posted some numbers over at GiovineItalia/Gadfly.jl#673 (comment)

@KristofferC
Copy link
Member

Amazing little tool @timholy. Thanks!

@ViralBShah
Copy link
Member

Closing since we have SnoopCompile.jl.

@rikhuijzer
Copy link
Contributor

This functionality is now available via PrecompileSignatures.jl. The package has zero dependencies and generates a few hundred precompile directives for Pluto.jl in less than a second from a fresh Julia instance. These directives are generated and evaluated during the precompilation phase. The package shaves a few seconds off the TTFX for Pluto and is currently part of the most recent Pluto release (0.19.4). See fonsp/Pluto.jl#2054 for benchmarks, but do note that I've also removed some manually added precompile directives. So the effect would be greater without that.

In practice, after trying PrecompileSignatures.jl on bunch of packages, my takeaways is that the reduction in TTFX is somewhere in between actually running code and not running any code or precompile directives at all. For Makie.jl, the real-world workload reduces TTFX by more than PrecompileSignatures does. This makes sense because precompile can only compile inferable calls whereas running code also compiles runtime dispatches.

So, the package is mostly useful for packages where running real-world workloads is infeasible due to side-effects or too many entrypoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules docs This change adds or pertains to documentation performance Must go faster
Projects
None yet
Development

No branches or pull requests

8 participants