Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster startup of generated JLL packages #812

Merged
merged 2 commits into from
Jun 18, 2020

Conversation

PallHaraldsson
Copy link
Contributor

No description provided.

@giordano
Copy link
Member

giordano commented Jun 5, 2020

I keep asking you the question but not getting an answer: did you actually benchmark this and found real improvements?

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 5, 2020

I only tested and confirmed speed-up this way (since this PR, I noticed your seemingly contradicting answer elsewhere, you got lower time than either, but strangely also fewer allocations than I do on 1.5.0-beta1.0):

$ ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -q

julia> @time using FFMPEG_jll
  1.740090 seconds (1.44 M allocations: 84.585 MiB, 0.90% gc time)

$ ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -O1 -q

julia> @time using FFMPEG_jll
  1.395736 seconds (1.44 M allocations: 84.616 MiB, 0.61% gc time)

I made sure I was up-to-date before:

(@v1.5) pkg> up FFMPEG_jll
   Updating registry at `~/.julia/registries/General`
   Updating git-repo `https://github.com/JuliaRegistries/General.git`
  Installed RecipesPipeline ─ v0.1.10
No Changes to `~/.julia/environments/v1.5/Project.toml`
Updating `~/.julia/environments/v1.5/Manifest.toml`
  [01d81517] ↑ RecipesPipeline v0.1.9 ⇒ v0.1.10

(@v1.5) pkg> up FFMPEG_jll#master

@PallHaraldsson PallHaraldsson changed the title Faster statup of generated JLL packages Faster startup of generated JLL packages Jun 5, 2020
@giordano
Copy link
Member

giordano commented Jun 5, 2020

Don't start Julia with -O1, that's not a way to actually benchmark the patch. You are not testing the changes you propose.

As I've shown in JuliaBinaryWrappers/FFMPEG_jll.jl#1 (comment), I see a very slight regression (but too small to be taken seriously, can also be random jitter), but no concrete improvement anyway.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 8, 2020

The reason my change was ineffective (but not slower) when only applied to FFMPEG_jll (or FFMPEG.jl), is that it has many dependencies, and for (some of) those it matters (what -O0 or -O1 does differently is applying to all dependencies too, what this PR does), at least for this one (benefit would add up by applying to more/all packages, while for some timing wouldn't change much):

https://github.com/JuliaBinaryWrappers/LAME_jll.jl/blob/master/src/LAME_jll.jl

The best time I ever got over several tries for that one package unchanged was something over 400 ms for min.

With the change I got almost 20 ms off, for just that one JLL:

pharaldsson_sym@SYMLINUX011:~$ hyperfine '~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"'
Benchmark #1: ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     422.9 ms ±  41.1 ms    [User: 849.2 ms, System: 469.5 ms]
  Range (min … max):   382.4 ms … 508.6 ms    10 runs
 
pharaldsson_sym@SYMLINUX011:~$ hyperfine '~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"'
Benchmark #1: ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     429.8 ms ±  31.6 ms    [User: 805.7 ms, System: 519.7 ms]
  Range (min … max):   388.0 ms … 486.4 ms    10 runs
 
pharaldsson_sym@SYMLINUX011:~$ hyperfine '~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"'
Benchmark #1: ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     418.2 ms ±  31.8 ms    [User: 833.9 ms, System: 480.3 ms]
  Range (min … max):   384.5 ms … 476.5 ms    10 runs
 
pharaldsson_sym@SYMLINUX011:~$ hyperfine '~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"'
Benchmark #1: ~/julia-1.6-DEV-latest-7c980c6af5/bin/julia --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     428.1 ms ±  37.2 ms    [User: 837.2 ms, System: 485.9 ms]
  Range (min … max):   382.1 ms … 493.4 ms    10 runs

Startup of Julia is 157.9 ms, what you need to subtract from all numbers, to get relative difference, otherwise it's seemingly lower.

@PallHaraldsson
Copy link
Contributor Author

Bump.

I think I showed some minor speedup, and enabled on all should be more. I think we should merge this/test. We then have time to revert before 1.5 is release if it turns out to be a problem.

@fingolfin
Copy link
Member

Those benchmark numbers are incredibly noisy (are you perhaps running the test on a laptop?) and so I wouldn't trust any deduction from that. So here is what I did to improve on that:

  • I repeated your test on one of our compute servers; I picked one where nobody else was doing computations (I started out without considering this, and the result was that tests fluctuated well beyond the standard deviation if I repeated them, say, 10 minutes later).
  • I made sure to run everything on an actual disk drive, not the NFS used for my home dir
  • I passed --warmup 2 -r 50 to hyperfine to suppress noise introduced by precompilation and FS caches
  • finally, the test was conducted with 1.5.0-beta1 so we have a fixed reference point (as opposed to using some random commit on master)

Here is what I got. First the startup speeds with default optimization / -O1 / -O0

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -e 1' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -e 1
  Time (mean ± σ):      95.5 ms ±   0.6 ms    [User: 70.4 ms, System: 49.7 ms]
  Range (min … max):    94.5 ms …  98.0 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O1 -e 1' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O1 -e 1
  Time (mean ± σ):      91.6 ms ±   0.6 ms    [User: 67.3 ms, System: 42.6 ms]
  Range (min … max):    90.3 ms …  93.6 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O0 -e 1' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O0 -e 1
  Time (mean ± σ):      91.0 ms ±   0.5 ms    [User: 65.8 ms, System: 42.2 ms]
  Range (min … max):    89.8 ms …  92.4 ms    50 runs

Now let's see how much it costs to just do using LAME_jll:

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     278.5 ms ±   1.2 ms    [User: 420.2 ms, System: 409.1 ms]
  Range (min … max):   277.0 ms … 283.5 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O1 -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O1 -e "using LAME_jll"
  Time (mean ± σ):     219.8 ms ±   0.8 ms    [User: 358.0 ms, System: 413.9 ms]
  Range (min … max):   218.1 ms … 222.1 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O0 -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O0 -e "using LAME_jll"
  Time (mean ± σ):     216.1 ms ±   0.6 ms    [User: 354.6 ms, System: 413.0 ms]
  Range (min … max):   214.8 ms … 217.5 ms    50 runs

Finally, I edited /tmp/juliadepot/packages/LAME_jll/K2IiX/src/LAME_jll.jl to insert the code change from this PR and repeated the above:

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -e "using LAME_jll"
  Time (mean ± σ):     268.7 ms ±   0.7 ms    [User: 408.2 ms, System: 411.7 ms]
  Range (min … max):   267.3 ms … 270.1 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O1 -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O1 -e "using LAME_jll"
  Time (mean ± σ):     220.6 ms ±   0.9 ms    [User: 359.3 ms, System: 412.9 ms]
  Range (min … max):   219.1 ms … 223.7 ms    50 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O0 -e "using LAME_jll"' --warmup 2 -r 50
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O0 -e "using LAME_jll"
  Time (mean ± σ):     217.3 ms ±   0.8 ms    [User: 360.2 ms, System: 408.2 ms]
  Range (min … max):   215.9 ms … 219.8 ms    50 runs

As we can see, there is a slight improvement at the default optimization level: from 278.5 ms ± 1.2 ms down to 268.7 ms ± 0.7 ms. But it is much smaller than the improvement from going to -O1 where we see 219.8 ms ± 0.8 ms. The difference cannot be explained by the difference in the startup speed, which is only about 3-4 ms.

@fingolfin
Copy link
Member

I've repeated this with FFMPEG_jll now, using a crude trick to modify all the involved JLL packages to use the patch from this PR:

for d in *_jll ; do sed -E -i -e 's/(module '$d')/\1\n\nif isdefined(Base, :Experimental) \&\& isdefined(Base.Experimental, Symbol("@optlevel"))\n  @eval Base.Experimental.@optlevel 0\nend\n/' $d/*/src/$d.jl ; done

Since this is quite a bit slower, I only used 20 runs.
Before the patch:

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -e "using FFMPEG_jll"
  Time (mean ± σ):      1.145 s ±  0.004 s    [User: 1.263 s, System: 0.433 s]
  Range (min … max):    1.140 s …  1.153 s    20 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O1 -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O1 -e "using FFMPEG_jll"
  Time (mean ± σ):     907.3 ms ±   2.1 ms    [User: 1.022 s, System: 0.436 s]
  Range (min … max):   904.7 ms … 912.9 ms    20 runs

seress:/tmp/juliadepot$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O0 -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O0 -e "using FFMPEG_jll"
  Time (mean ± σ):     890.6 ms ±   3.4 ms    [User: 1.010 s, System: 0.432 s]
  Range (min … max):   886.7 ms … 897.9 ms    20 runs

After the patch:

seress:/tmp/juliadepot/packages$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -e "using FFMPEG_jll"
  Time (mean ± σ):     967.8 ms ±   3.7 ms    [User: 1.086 s, System: 0.432 s]
  Range (min … max):   963.3 ms … 980.1 ms    20 runs

seress:/tmp/juliadepot/packages$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O1 -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O1 -e "using FFMPEG_jll"
  Time (mean ± σ):     896.5 ms ±   2.5 ms    [User: 1.011 s, System: 0.436 s]
  Range (min … max):   891.8 ms … 900.1 ms    20 runs

seress:/tmp/juliadepot/packages$ JULIA_DEPOT_PATH=/tmp/juliadepot hyperfine 'julia-1.5.0-beta1 --startup-file=no -O0 -e "using FFMPEG_jll"' --warmup 2 -r 20
Benchmark #1: julia-1.5.0-beta1 --startup-file=no -O0 -e "using FFMPEG_jll"
  Time (mean ± σ):     892.6 ms ±   1.9 ms    [User: 1.009 s, System: 0.436 s]
  Range (min … max):   889.1 ms … 897.0 ms    20 runs

A reduction from 1145 ms ± 4 ms down to 967.8 ms ± 3.7 ms, i.e., a saving of ~177 ms or about 15%,, indeed seems quite compelling. Of course 968ms is still long. Oh and if one could get even closer to what -O1/-O0 give, that'd be another 70 ms.

@PallHaraldsson
Copy link
Contributor Author

I'm not on a laptop, yes noisy (with browser running), so I rely on min. time, but anyway it seems you're convinced. It doesn't have to be a lot of improvement, any helps, and for JLLs I do not see a potential for a downside (always same generated code, that doesn't rely on optimizer).

@PallHaraldsson
Copy link
Contributor Author

Since Julia 1.5 RC1 is imminent (?), no longer blocked for merging, I think we should go for this, while we still have time for the final release of 1.5.

@fingolfin
Copy link
Member

@PallHaraldsson I don't see why the release of Julia 1.5 matters for this? Most JLLs are built via Yggdrasil. And they won't be regenerated when this PR is merged either -- each one will have to be manually updated. But this is independent of Julia 1.5.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 16, 2020

Hmm, I assumed would be regenerated for Yggdrasil. Or that would be helpful... if it's possible to make that happen. Otherwise, people will see gradual decrease of startup time for JLL packages, as JLL dependencies get updated to (some might never?).

@staticfloat staticfloat merged commit 0701e99 into JuliaPackaging:master Jun 18, 2020
@staticfloat
Copy link
Member

Thanks @PallHaraldsson!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants