Faster incremental sysimg rebuilds #40414

Keno · 2021-04-09T04:06:26Z

Faster incremental sysimg rebuilds

Recent improvements in precompilation have improved compile
time issues like ttfp quite significantly. However, it is
still significantly faster to just build a system image,
in which case ttfp is basically instant. The difference is
primarily due to us not being able to store native code in
.ji files as well as invalidations of previously loaded code
requiring recompilation. In the long term these issues can
be overcome, but in the short term, I think we should try
to leverage system images more heavily, since they already
basically solve the problem. I believe the reason people
aren't really using system images is three-fold

System images take a long time to compile
The system image workflow is pretty manual
System images don't play well with pkg updates etc.

Thus, my evil plan to improve the situation is

Make system images build faster
Add an autoload annotation to Project.toml files.
If present, julia will hash the manifest and look
for any matching system image in ~/.julia/sysimages.
Make system images build even faster

The idea is that for the standard workflow where people
just use plain julia with the default environment or
julia --project system images would be just loaded
automatically, thus reducing the barrier to entry.

In the initial version, there is no automatic rebuild
of these system images - they would still be built manually
with PkgCompiler, but at least the loading side would
be automatic and hopefully the build will be fast enough
that people will actually be willing to wait.
Eventually the rebuild could also be automatic
(maybe even in the background).

The major drawback of this plan is that system images will
start with all packages already loaded (even if their
bindings aren't present in Main). This will require some
workflow adjustments. I think it'll probably turn out fine,
but it's worth highlighting.

This PR is step 1 in this direction. It provides the ability
to rebuild system images much faster. The key observation
is that most of the time in sysimage build is spent in LLVM
generating native code (serializing julia's data structures
is quite fast). Thus if we can re-use the code already
generated for the system image we're currently running, we'll
save a fair amount of time.

Unfortunately, this is not 100% straightforward since we were
assuming that no linking happens in a number of places. This
PR hacks around that, but it is not a particularly satisfying
long term solution. That said, it should work fine, and I think
it's worth doing, so that we can explore the workflow
adjustments that would rely on this.

With that said, here's how to use this (at the low level, of
course PkgCompiler would just handle this)

$ mkdir chained
$ time ./usr/bin/julia --sysimage-native-code=chained --sysimage=usr/lib/julia/sys.so --output-o chained/chained.o.a -e 'Base.__init_build();'
real	0m9.633s
user	0m8.613s
sys	0m1.020s
$ cd chained
$ cp ../usr/lib/julia/sys-o.a . # Get the -o.a from the old sysimage
$ ar x sys-o.a # Extract it into text.o and data.o
$ rm data.o # rm the serialized sysimg data
$ mv text.o text-old.o
$ llvm-objcopy --remove-section .data.jl.sysimg_link text-old.o # rm the link between the native code and the old sysimg data
$ ar x chained.o.a # Extract new sysimage files
$ gcc -shared -o chained.so text.o data.o text-old.o # Link everything
$ ../julia --sysimage=chained.so

As can be seen, regenerating the system image took about 9s (the
subsequent commands aren't timed here, but take less than a second total).
This compares very favorably with a non-chained sysimg rebuild:

time ./usr/bin/julia --sysimage=usr/lib/julia/sys.so --output-o nonchained.o.a -e 'Base.__init_build();'

real	2m42.667s
user	2m39.211s
sys	0m3.452s

Of course if you do load additional packages, the extra code
does still need to be compiled, so e.g. building a system image
for Plots goes from 3 mins to 1 mins (building all of plots,
plus everything in base that got invalidated). That is still all in
LLVM though - it should be relatively straightforward to
multithread that after this PR (since linking the sysimg
in multiple pieces is allowed). That part is not implemented
yet though.

NHDaly · 2021-04-12T15:34:29Z

Amazing! Thank you for tackling this, @Keno! It sounds very exciting!

In case you haven't seen this already, regarding speeding up PackageCompiler, I've asked in the past about why PackageCompiler currently ends up compiling everything to native code twice, where we round-trip through a text file to record the compilations: JuliaLang/PackageCompiler.jl#486. Just wanted to float this past your vision in case you hadn't seen it. Kristoffer provided good explanation, but it seems like something that could be improved with a bit of design work.

Keno · 2021-04-12T23:26:40Z

This can basically address that as well. You can build the whole sysimg without precompiles, then dump out your precompiles with that and then use this mechanism to build a chained sysimg as before much faster.

Keno · 2021-04-13T03:37:42Z

@timholy one application that comes to mind is speeding up development of Base itself. Could we have a mode of Revise where it serializes what needs to be Revised in some easy to load format, and then use this to quickly update an existing system image (without Revise itself showing up in the system image)?

This is the second part of the plan described in #40414 (though complimentary to the PR itself). In particular, this PR makes it possible to quickly replace a system image during initial startup. This is done by adding a hook early in the startup sequence (after the system image, but before any dependent libraries are initialized) for Julia to look at the specified project file and decide to load a different sysimage instead. In the current version of the PR, this works as following: - If the `--autoload` argument is specified, julia will hash the contents of the currently active project's manifest path. - If a corresponding .so is found in `~/.julia/sysimages`, it will load that sysimage instead. - If not, loading will proceed as usual, a warning is generated but before any user code is run, Julia will `require` any dependencies specified in the Project.toml. The third point is there such that independent of whether or not the system image is found, the environment upon transfer of control to the user is always the same (e.g. a package may have type-pirated a method, which is available independent of whether the user ever explicitly did `using`). This is highly incomplete. In particular, these scheme to find the system image needs to take account of preferences and should probably exlcude any packages that are `dev`'ed (or their dependents). I'm not sure I'll have the time to get around to finishing this, but I'm hoping somebody else would be willing to jump in for that part. The underlying mechanism seems to work fine at this point, so this work should be mostly confined to loading.jl.

The multiversioning pass currently does two things: - Clone all functions and create a set of tables to tell the sysimage loader where to find the various cloned functions. - Compress the table of pointers by going from 64 bit pointers to 32 bit offsets from the first function of the .text section. The second optimization is useful, because it cuts down on space and speed up dynamic loading. Unfortunately relocations of this kind are not expressible in all object formats and as a result this scheme does not work if the table needs to describe function pointers in multiple compilation units. I'm working on improving the performance of incremental system image rebuilds which would rely on being able to re-link such system images and is thus incomptabile with this compression. There are possible ways, to make it compatible, namely: - Add a relocation to all the relevant file formats that expresses offsets from the start of the section, or, - Change the multiversion table to be pcrel rather than relative to the first function in the table. The first would require some signifcant coordination with standards bodies, and both are currently not supported in LLVM. To make progress on this issue, simply make the multiversion pass optional and keep the table uncompressed in this case. This wastes some space and adds a few fractions of a second to the system image load time, but it should let us proceed on the incremental sysimage project. If it works well, we can go back and consider the future of the multiversioning tables.

This commit provides the ability to rebuild system images much faster. The key observation is that most of the time in sysimage build is spent in LLVM generating native code (serializing julia's data structures is quite fast). Thus if we can re-use the code already generated for the system image we're currently running, we'll save a fair amount of time. Unfortunately, this is not 100% straightforward since we were assuming that no linking happens in a number of places. This PR hacks around that, but it is not a particularly satisfying long term solution. That said, it should work fine, and I think it's worth doing, so that we can explore the workflow adjustments that would rely on this. With that said, here's how to use this (at the low level, of course PkgCompiler would just handle this) ```shell $ mkdir chained $ time ./usr/bin/julia --sysimage-native-code=chained --sysimage=usr/lib/julia/sys.so --output-o chained/chained.o.a -e 'Base.__init_build();' real 0m9.633s user 0m8.613s sys 0m1.020s $ cp ../usr/lib/julia/sys-o.a . # Get the -o.a from the old sysimage $ ar x sys-o.a # Extract it into text.o and data.o $ rm data.o # rm the serialized sysimg data $ mv text.o text-old.o $ llvm-objcopy --remove-section .data.jl.unique text-old.o # rm the link between the native code and the old sysimg data $ ar x chained.o.a # Extract new sysimage files $ gcc -shared -o chained.so text.o data.o text-old.o # Link everything $ ../julia --sysimage=chained.so ``` As can be seen, regenerating the system image took about 9s (the subsequent commands aren't timed here, but take less than a second total). This compares very favorably with a non-chained sysimg rebuild: ``` time ./usr/bin/julia --sysimage=usr/lib/julia/sys.so --output-o nonchained.o.a -e 'Base.__init_build();' real 2m42.667s user 2m39.211s sys 0m3.452s ``` Of course if you do load additional packages, the extra code does still need to be compiled, so e.g. building a system image for `Plots` goes from 3 mins to 1 mins (building all of plots, plus everything in base that got invalidated). That is still all in LLVM though - it should be relatively straightforward to multithread that after this PR (since linking the sysimg in multiple pieces is allowed). That part is not implemented yet though.

timholy · 2021-08-22T11:33:58Z

Just noticed #40414 (comment). Sure, that would be pretty easy to do in principle. What exactly would it look like?

File an issue at Revise when you want this; my impression is that we're not yet at a place where this will make a difference.

timholy · 2021-08-22T11:47:11Z

I have to say, this triggers my love/hate relationship with https://github.com/JuliaLang/PackageCompiler.jl. I totally get why it's necessary to have it, but at the same time its existence is probably what's let us get away for so long without just implementing native-code caching in package .ji files. I'd rather just fix that. Are we really so far from that goal? It just doesn't seem like it should be all that insurmountable. I'm on a bit of a close-the-precompile-issues rampage right now. There really aren't that many issues per se, but we'll still need some things (the method.roots problem #32705, and the issue of whether to store non-internal CodeInstances) that are pretty big.

jpsamaroo · 2021-08-23T13:55:53Z

Alternatively, could we put the native code into shared libraries, and load them when we load .ji files? It could improve the situation of calling Julia from C, since we'd have an obvious place to emit ccall-able methods, and they could potentially remain available without the runtime being started (if they don't depend on the runtime).

timholy · 2021-08-23T14:58:14Z

I'm not really sure of the right implementation, mostly because I've never actually looked at the format of a shared library file. But that seems pretty sensible. Once we can cache external MethodInstances & CodeInstances (in our current no-native-code format), AFAICT the main remaining job is doing the work of the linker. If we can rely on external tools, that seems likely to be a win.

ViralBShah · 2024-08-08T10:54:51Z

@Keno is this the PR you said can be rebased and brought back?

Keno force-pushed the kf/fastsysimg branch 3 times, most recently from c2d1343 to 86c959e Compare April 10, 2021 01:00

kshyatt added building Build system, or building Julia or its dependencies performance Must go faster labels Apr 13, 2021

Keno mentioned this pull request Apr 13, 2021

WIP: Add sysimage autoload mechanism #40472

Closed

Keno added 2 commits April 13, 2021 18:40

Keno force-pushed the kf/fastsysimg branch from 86c959e to 0d4c974 Compare April 14, 2021 01:41

jpsamaroo mentioned this pull request Sep 9, 2021

GPUCompiler rewrite tshort/StaticCompiler.jl#43

Closed

petvana added a commit to petvana/julia that referenced this pull request Jul 14, 2022

Rebase JuliaLang#40414 and update for Julia v1.9 + add test

49772cc

petvana mentioned this pull request Jul 15, 2022

Faster incremental rebuilds of (user-specific) sysimgs #46045

Closed

6 tasks

vtjnash closed this Jan 2, 2023

Keno mentioned this pull request Sep 22, 2023

cli: Add infrastructure for new CLI drivers juliax and juliac #51417

Open

giordano deleted the kf/fastsysimg branch August 8, 2024 11:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster incremental sysimg rebuilds #40414

Faster incremental sysimg rebuilds #40414

Keno commented Apr 9, 2021 •

edited

Loading

NHDaly commented Apr 12, 2021

Keno commented Apr 12, 2021

Keno commented Apr 13, 2021

timholy commented Aug 22, 2021

timholy commented Aug 22, 2021

jpsamaroo commented Aug 23, 2021

timholy commented Aug 23, 2021

ViralBShah commented Aug 8, 2024

Faster incremental sysimg rebuilds #40414

Faster incremental sysimg rebuilds #40414

Conversation

Keno commented Apr 9, 2021 • edited Loading

NHDaly commented Apr 12, 2021

Keno commented Apr 12, 2021

Keno commented Apr 13, 2021

timholy commented Aug 22, 2021

timholy commented Aug 22, 2021

jpsamaroo commented Aug 23, 2021

timholy commented Aug 23, 2021

ViralBShah commented Aug 8, 2024

Keno commented Apr 9, 2021 •

edited

Loading