Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate multiple packages with the a single builder #778

Open
giordano opened this issue May 6, 2020 · 10 comments
Open

Generate multiple packages with the a single builder #778

giordano opened this issue May 6, 2020 · 10 comments

Comments

@giordano
Copy link
Member

giordano commented May 6, 2020

From time to time I look to the package managers of Linux distributions to see if we can pick up some interesting ideas. One thing that I think would be really cool to have here is to be able to generate multiple JLL packages with a single builder: the result of a build doesn't go into a single tarball, but it might be split into many of them.

One fancy application is to be able to generated:

  • Libfoo_jll: contains only bin/ and lib/, this is the runtime part, what the Julia packages will use;
  • Libfoo_dev_jll: contains include/, header files are generally useless for Julia packages and they mostly clutter ~/.julia/artifacts/ with dozens of small filess. Ideally, this would be automatically installed, if available, when Libfoo_jll is used as dependency in a build;
  • Libfoo_dbg_jll: contains the debug symbols of the shared library, that users can optionally install to get more useful debug information about crashes or errors. Based on an idea by @Keno.

Also, I think that LLVM_full_jll is currently "wrong": IMO it should simply be an empty metapackage binding all the other pieces. Instead now it's a monster package containing the same data as its pieces, which means that if we use both LLVM_full_jl and libLLVM_jll in a build, they would step onto each other's toes. Having a single builder that produces all other subpackages would probably make @vchuravy happy, too.

@Keno
Copy link
Contributor

Keno commented May 6, 2020

maybe also a jll for the original source directory such that we can download it automatically in the debugger if necessary.

@staticfloat
Copy link
Member

I agree that this is desirable. I'm not entirely sure that the right way to do it is to create multiple JLL packages; or at least, not necessarily the user-facing way.

Here are my thoughts:

  • For some projects, we have the genuine desire to split a JLL into multiple independent packages; Clang_jll and LLVM_jll really don't have anything to do with eachother; sometimes you may want Clang_jll and not LLVM_jll and vice-versa. The fact that they both stem from the same build process is more or less an implementation detail. (Oh, and they both rely upon LibLLVM_jll, but that's fine). To save on build time/duplicated effort, it would be nice to be able to split a single build_tarballs()'s output into multiple, independent, JLL packages.

  • For most of the projects in existence, we have a mixture of files; some things are generally essential (dynamic libraries, executables) some things are nice to have (include files, external debugging symbols) and some things are almost never needed (static libraries). It would be nice to be able to split a single build_tarballs()'s output into different "configurations", making use of Expand artifact selection beyond Platform/CompilerABI JuliaLang/Pkg.jl#1780

First and foremost, for this to work nicely, I think we're going to need to break up build_tarballs() a bit; right now we have everything built with the very deep assumption that we can flow smoothly from sources to JLLs, but that breaks down in a few places such as IntelOpenMP, CUDA, LLVM, MKL, etc... I think we need to split build_tarballs() up into two separate pieces: the piece where we call autobuild() as many times as we need, generating unpacked prefixes of build products, then the piece where we carve those prefixes up into JLLs. We can, of course, continue to expose a build_tarballs() that does all that automatically, but we need to have a re-think of the underlying mechanisms to make this effortless.

API overview

I envision having a function build_binaries!() that we call in a similar manner to build_tarballs(), but it doesn't take in name, version or products; all it does is build unpacked prefixes, and return the meta information about that build, coalesced into a single meta object:

meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

This meta object will contain all the information we're used to having in e.g. the JSON object (and in fact will be what we serialize with --json-meta in the future; this will make it much easier to understand how we mock out parts of the BB pipeline when running on Yggdrasil), and is what we will use when we perform the second step, which is extraction and JLL construction:

# This would be defined by default, but just explicitly make it for illustration's sake
everything_extractor = raw"""
mv ${srcdir}/* ${prefix}/
"""
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)

What this enables

This gives us the flexibility to do an awful lot:

  • Do the "fancy toys" trick to support separate sources/dependencies/whatever per-platform:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, filter(p -> !Sys.iswindows(p), platforms), dependencies)
build_binaries!(meta, ARGS, win_sources, win_script, filter(p -> Sys.iswindows(p), platforms), dependencies)
build_jll!(meta, name, version, platforms, dependencies, everything_extractor)

Note that if we're going through the trouble of rewriting this stuff, we can probably get rid of should_build_platform() in fancy toys by doing that automatically inside of build_binaries!(); e.g. if a platform is given within ARGS, use that to filter out the passed-in platforms objects, and if there's nothing left, return eagerly.

  • Split a single build into multiple JLL packages:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

LibLLVM_extractor = raw"""
# Copy over `llvm-config`, `libLLVM` and `include`, specifically.
mkdir -p ${prefix}/include ${prefix}/tools ${libdir} ${prefix}/lib
mv -v ${srcdir}/include/llvm* ${prefix}/include/
mv -v ${srcdir}/tools/llvm-config* ${prefix}/tools/
mv -v ${srcdir}/$(basename ${libdir})/*LLVM*.${dlext}* ${libdir}/
mv -v ${srcdir}/lib/*LLVM*.a ${prefix}/lib
"""
build_jll!(meta, "LibLLVM_jll", version, platforms, dependencies, LibLLVM_extractor)

Clang_extractor = raw"""
...
"""
build_jll!(meta, "Clang_jll", version, platforms, dependencies, Clang_extractor)
...
# Build once with `-O2` and extract it into a default variant, as well as a "build" variant:
meta = BuildMetadata()
build_binaries!(meta, ARGS, sources, script, platforms, dependencies)

base_extractor = raw"""
for f in $(find_binary_objects ${srcdir}); do
    rp=$(relpath ${srcdir} ${f})
    mkdir -p $(dirname ${rp})
    mv ${f} ${prefix}/${rp}
done
"""
build_jll!(meta, name, version, platforms, dependencies, base_extractor)
build_jll!(meta, "$(name)+build", version, platforms, dependencies, everything_extractor)

# Build once with `-O1 -g` and bundle it into the "debug" variant:
debug_meta = BuildMetadata()
build_binaries!(debug_meta, ARGS, sources, debug_script, platforms, dependencies)
build_jll!(meta, "$(name)+debug", version, platforms, dependencies, everything_extractor)

The "variants" would all be put into the same JLL release as artifacts with names that have the + postpended (as that's not a valid JLL name, of course), and we'd have ways for the user to request which artifacts get installed on their system through things like JuliaLang/Pkg.jl#1780. We could arbitrarily decide that BB itself always installs the +build variant (if available) into the prefix when building. (Or we could even allow for Depedency() objects to provide a variant kwarg)

What do you guys think?

@vchuravy
Copy link
Member

That sounds fantastic! I would call them build! and generate!/package!

@staticfloat
Copy link
Member

staticfloat commented Jun 20, 2020

Intelligent building and serving of debug symbols

If we had "partial artifact" download support, we could simplify this a bit, in that we could generate only a single tarball that has everything: binaries, headers, and separate debug files. We could then work some Pkg server magic to allow requesting a union of subtrees rather than always the entire content tree. This would allow us to, for instance, request the union of subtrees that corresponds to just the shared libraries within lib/ and the binaries within bin. The PkgServer would then generate a cut-down tarball containing those resources and pass it down to us. This would be the "minimal" artifact variant, while the "build" variant would include things like headers, static libraries, etc... Finally, the "full" variant would include external debug symbols that were stripped out from the executables during build.

To strip out debug symbols into external files, we can use the following tools:

  • For ELF and COFF files:
objcopy --only-keep-debug $file $debug_file
strip --strip-debug --strip-unneeded $file
objcopy --add-gnu-debuglink $debug_file $file

Assuming we are able to work our PkgServer magic above, we will be able to stream down content trees where these files exist on-disk side-by-side, which makes the whole thing much easier. If we must keep the files separate, this becomes more difficult, we'd probably have to modify files on-disk to get relative pathing correct, or force debuggers to do the searching themselves (this is easier if we embed build ids, see below).

  • For Mach-O files, we can use dsymutil to create .dSYM bundles (or files, if we want, by passing in -flat):
dsymutil $file

Note that we probably want to start adding --build-id=sha1 to our LDFLAGS to aid in debugging efforts, as that allows for easier matching of files.

Doing this magically via compiler wrappers/BB magic

We can force -g into all compiler invocations via our compiler wrappers, and invoke dsymutil upon all executables at the end of the build if we're running on Darwin. It really should be that simple. :)

@staticfloat
Copy link
Member

Oh, I also just thought to myself it would be cool to switch between e.g. debug and non-debug versions through Preferences, so a JLL package would default to installing a minimal variant, but it can be opted-in to a higher variant by setting a Preference in the overall Project.toml that is using the JLL.

@staticfloat
Copy link
Member

Thinking about this again, it would also be really sweet for debug versions of JLLs to include all source files referenced by the DWARF files, stored in a predictable place (like <$artifact_path>/src) so that we can use source-map to get lldb/gdb to find the source when we're debugging an artifact.

We can add a post-processing step that inspects all DWARF files, finds all referenced source files (even autogenerated ones) and stores them in the appropriate location within a $destdir/src directory. Then we just need a convenient way to map /workspace/srcdir => $artifact_path/src within lldb/gdb and we'll have a really slick debugging experience for our users.

@vchuravy
Copy link
Member

I think most of our compiler support split dwarf info? https://gcc.gnu.org/wiki/DebugFissionDWP

@giordano
Copy link
Member Author

giordano commented Jun 6, 2021

In the last few weeks I've been thinking about this issue again, and coming up with beautiful ideas like using Preferences.jl to install debug version of packages, just to realise that Elliot already proposed it 😞

Another idea that just came to my mind is to have dev/debug tarballs as lazy artifacts of the same JLL package, instead of their own packages, but Elliot anticipated me again:

The "variants" would all be put into the same JLL release as artifacts with names that have the + postpended

I like this idea! In particular, I'm thinking about splitting also the logs into their own tarballs. A nice benefit is that this could make the runtime tarball reproducible across multiple identical rebuilds.

One additional thing to mention is that now that we have JLLWrappers.jl we can automatically generate functions to download the additional artifacts, without having to change anything in the packages.

@fingolfin
Copy link
Member

Just wanted to add that for JLLs which link against libjulia, it would be nice to have debug variants which link against libjulia-debug (this is orthogonal to the question of debug symbols and how to handle them). Right now, I am debugging a Julia package (Oscar.jl) involving four JLLs linking against libjulia (libcxxwrap-julia, libsingular-julia, libpolymake-julia, GAP) and it isn't exactly fun.

So perhaps there could be another "variant marker" indicating "download this instead of the default if this is a Julia debug build"

@vchuravy
Copy link
Member

I just came across debuginfod which seems very complementary. It allows for gdb and others to auto-fetch debuginfo!

giordano added a commit to giordano/BinaryBuilder.jl that referenced this issue Sep 18, 2022
Long term plan is still to do JuliaPackaging#778, but that's a lot of work, and we have little
time at the moment.  Simply splitting the logs into a separate tarball, instead,
is a simpler short-term solution which has some very important bonuses:

- the git tree hash of the main tarball should be reproducible (as long as you
use the same toolchain)
- which means we can start testing reproducibility of builds
- rebuilding a package without changing its content won't create a new artifact:
  less strain on the Package Storage Server.
giordano added a commit that referenced this issue Sep 19, 2022
* [AutoBuild] Split log files into a separate tarball

Long term plan is still to do #778, but that's a lot of work, and we have little
time at the moment.  Simply splitting the logs into a separate tarball, instead,
is a simpler short-term solution which has some very important bonuses:

- the git tree hash of the main tarball should be reproducible (as long as you
use the same toolchain)
- which means we can start testing reproducibility of builds
- rebuilding a package without changing its content won't create a new artifact:
  less strain on the Package Storage Server.

* [AutoBuild] Filter out logs tarball when rebuilding the package
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants