Skip to content

Latest commit

 

History

History
541 lines (432 loc) · 26 KB

0403-cargo-build-command.md

File metadata and controls

541 lines (432 loc) · 26 KB

Summary

Overhaul the build command internally and establish a number of conventions around build commands to facilitate linking native code to Cargo packages.

  1. Instead of having the build command be some form of script, it will be a Rust command instead
  2. Establish a namespace of foo-sys packages which represent the native library foo. These packages will have Cargo-based dependencies between *-sys packages to express dependencies among C packages themselves.
  3. Establish a set of standard environment variables for build commands which will instruct how foo-sys packages should be built in terms of dynamic or static linkage, as well as providing the ability to override where a package comes from via environment variables.

Motivation

Building native code is normally quite a tricky business, and the original design of Cargo was to essentially punt on this problem. Today's "solution" involves invoking an arbitrary build command in a sort of pseudo-shell with a number of predefined environment variables. This ad-hoc solution was known to be lacking at the time of implementing with the intention of identifying major pain points over time and revisiting the design once we had more information.

While today's "hands off approach" certainly has a number of drawbacks, one of the upsides is that Cargo minimizes the amount of logic inside it as much as possible. This proposal attempts to stress this point as much as possible by providing a strong foundation on which to build robust build scripts, but not baking all of the logic into Cargo itself.

The time has now come to revisit the design, and some of the largest pain points that have been identified are:

  1. Packages needs the ability to build differently on different platforms.
  2. Projects should be able to control dynamic vs static at the top level. Note that the term "project" here means "top level package".
  3. It should be possible to use libraries of build tool functionality. Cargo is indeed a package manager after all, and currently there is no way share a common set of build tool functionality among different Cargo packages.
  4. There is very little flexibility in locating packages, be it on the system, in a build directory, or in a home build dir.
  5. There is no way for two Rust packages to declare that they depend on the same native dependency.
  6. There is no way for C libraries to express their dependence on other C libraries.
  7. There is no way to encode a platform-specific dependency.

Each of these concerns can be addressed somewhat ad-hocly with a vanilla build command, but Cargo can certainly provide a more comprehensive solution to these problems.

Most of these concerns are fairly self-explanatory, but specifically (2) may require a bit more explanation:

Selecting linkage from the top level

Conceptually speaking, a native library is largely just a collections of symbols. The linkage involved in creating a final product is an implementation detail that is almost always irrelevant with respect to the symbols themselves.

When it comes to linking a native library, there are often a number of overlapping and sometimes competing concerns:

  1. Most unix-like distributions with package managers highly recommend dynamic linking of all dependencies. This reduces the overall size of an installation and allows dependencies to be updated without updating the original application.
  2. Those who distribute binaries of an application to many platforms prefer static linking as much as possible. This is largely done because the actual set of libraries on the platforms being installed on are often unknown and could be quite different than those linked to. Statically linking solves these problems by reducing the number of dependencies for an application.
  3. General developers of a package simply want a package to build at all costs. It's ok to take a little bit longer to build, but if it takes hours of googling obscure errors to figure out you needed to install libfoo it's probably not ok.
  4. Some native libraries have obscure linkage requirements. For example OpenSSL on OSX likely wants to be linked dynamically due to the special keychain support, but on linux it's more ok to statically link OpenSSL if necessary.

The key point here is that the author of a library is not the one who dictates how an application should be linked. The builder or packager of a library is the one responsible for determining how a package should be linked.

Today this is not quite how Cargo operates, depending on what flavor of syntax extension you may be using. One of the goals of this re-working is to enable top-level projects to make easier decisions about how to link to libraries, where to find linked libraries, etc.

Detailed design

Summary:

  • Add a -l flag to rustc
  • Tweak an include! macro to rustc
  • Add a links key to Cargo manifests
  • Add platform-specific dependencies to Cargo manifests
  • Allow pre-built libraries in the same manner as Cargo overrides
  • Use Rust for build scripts
  • Develop a convention of *-sys packages

Modifications to rustc

A new flag will be added to rustc:

    -l LIBRARY          Link the generated crate(s) to the specified native
                        library LIBRARY. The name `LIBRARY` will have the format
                        `kind:name` where `kind` is one of: dylib, static,
                        framework. This corresponds to the `kind` key of the
                        `#[link]` attribute. The `name` specified is the name of
                        the native library to link. The `kind:` prefix may be
                        omitted and the `dylib` format will be assumed.
rustc -l dylib:ssl -l static:z foo.rs

Native libraries often have widely varying dependencies depending on what platforms they are compiled on. Often times these dependencies aren't even constant among one platform! The reality we sadly have to face is that the dependencies of a native library itself are sometimes unknown until build time, at which point it's too late to modify the source code of the program to link to a library.

For this reason, the rustc CLI will grow the ability to link to arbitrary libraries at build time. This is motivated by the build scripts which Cargo is growing, but it likely useful for custom Rust compiles at large.

Note that this RFC does not propose style guidelines nor suggestions for usage of -l vs #[link]. For Cargo it will later recommend discouraging use of #[link], but this is not generally applicable to all Rust code in existence.

Declaration of native library dependencies

Today Cargo has very little knowledge about what dependencies are being used by a package. By knowing the exact set of dependencies, Cargo paves a way into the future to extend its handling of native dependencies, for example downloading precompiled libraries. This extension allows Cargo to better handle constraint 5 above.

[package]

# This package unconditionally links to this list of native libraries
links = ["foo", "bar"]

The key links declares that the package will link to and provide the given C libraries. Cargo will impose the restriction that the same C library must not appear more than once in a dependency graph. This will prevent the same C library from being linked multiple times to packages.

If conflicts arise from having multiple packages in a dependency graph linking to the same C library, the C dependency should be refactored into a common Cargo-packaged dependency.

It is illegal to define links without also defining build.

Platform-specific dependencies

A number of native dependencies have various dependencies depending on what platform they're building for. For example, libcurl does not depend on OpenSSL on Windows, but it is a common dependency on unix-based systems. To this end, Cargo will gain support for platform-specific dependencies, solving constraint 7 above:

[target.i686-pc-windows-gnu.dependencies.crypt32]
git = "https://github.com/user/crypt32-rs"

[target.i686-pc-windows-gnu.dependencies.winhttp]
path = "winhttp"

Here the top-level configuration key target will be a table whose sub-keys are target triples. The dependencies section underneath is the same as the top-level dependencies section in terms of functionality.

Semantically, platform specific dependencies are activated whenever Cargo is compiling for the exact target. Dependencies in other $target sections will not be compiled.

However, when generating a lockfile, Cargo will always download all dependencies unconditionally and perform resolution as if all packages were included. This is done to prevent the lockfile from radically changing depending on whether the package was last built on Linux or windows. This has the advantage of a stable lockfile, but has the drawback that all dependencies must be downloaded, even if they're not used.

Pre-built libraries

A common pain point with constraints 1, 2, and cross compilation is that it's occasionally difficult to compile a library for a particular platform. Other times it's often useful to have a copy of a library locally which is linked against instead of built or detected otherwise for debugging purposes (for example). To facilitate these pain points, Cargo will support pre-built libraries being on the system similar to how local package overrides are available.

Normal Cargo configuration will be used to specify where a library is and how it's supposed to be linked against:

# Each target triple has a namespace under the global `target` key and the
# `libs` key is a table for each native library.
#
# Each library can specify a number of key/value pairs where the values must be
# strings. The key/value pairs are metadata which are passed through to any
# native build command which depends on this library. The `rustc-flags` key is
# specially recognized as a set of flags to pass to `rustc` in order to link to
# this library.
[target.i686-unknown-linux-gnu.ssl]
rustc-flags = "-l static:ssl -L /home/build/root32/lib"
root = "/home/build/root32"

This configuration will be placed in the normal locations that .cargo/config is found. The configuration will only be queried if the target triple being built matches what's in the configuration.

Rust build scripts

First pioneered by @tomaka in rust-lang/cargo#610, the build command will no longer be an actual command, but rather a build script itself. This decision is motivated in solving constraints 1 and 3 above. The major motivation for this recommendation is the realization that the only common denominator for platforms that Cargo is running on is the fact that a Rust compiler is available. The natural conclusion from this fact is for a build script is to use Rust itself.

Furthermore, Cargo itself which serves quite well as a dependency manager, so by using Rust as a build tool it will be able to manage dependencies of the build tool itself. This will allow third-party solutions for build tools to be developed outside of Cargo itself and shared throughout the ecosystem of packages.

The concrete design of this will be the build command in the manifest being a relative path to a file in the package:

[package]
# ...
build = "build/compile.rs"

This file will be considered the entry point as a "build script" and will be built as an executable. A new top-level dependencies array, build-dependencies will be added to the manifest. These dependencies will all be available to the build script as external crates. Requiring that the build command have a separate set of dependencies solves a number of constraints:

  • When cross-compiling, the build tool as well as all of its dependencies are required to be built for the host architecture instead of the target architecture. A clear delineation will indicate precisely what dependencies need to be built for the host architecture.
  • Common packages, such as one to build cmake-based dependencies, can develop conventions around filesystem hierarchy formats to require minimum configuration to build extra code while being easily identified as having extra support code.

This RFC does not propose a convention of what to name the build script files.

Unlike links, it will be legal to specify build without specifying links. This is motivated by the code generation case study below.

Inputs

Cargo will provide a number of inputs to the build script to facilitate building native code for the current package:

  • The TARGET environment variable will contain the target triple that the native code needs to be built for. This will be passed unconditionally.
  • The NUM_JOBS environment variable will indicate the number of parallel jobs that the script itself should execute (if relevant).
  • The CARGO_MANIFEST_DIR environment variables will be the directory of the manifest of the package being built. Note that this is not the directory of the package whose build command is being run.
  • The OPT_LEVEL environment variable will contain the requested optimization level of code being built. This will be in the range 0-2. Note that this variable is the same for all build commands.
  • The PROFILE environment variable will contain the currently active Cargo profile being built. Note that this variable is the same for all build commands.
  • The DEBUG environment variable will contain true or false depending on whether the current profile specified that it should be debugged or not. Note that this variable is the same for all build commands.
  • The OUT_DIR environment variables contains the location in which all output should be placed. This should be considered a scratch area for compilations of any bundled items.
  • The CARGO_FEATURE_<foo> environment variable will be present if the feature foo is enabled. for the package being compiled.
  • The DEP_<foo>_<key> environment variables will contain metadata about the native dependencies for the current package. As the output section below will indicate, each compilation of a native library can generate a set of output metadata which will be passed through to dependencies. The only dependencies available (foo) will be those in links for immediate dependencies of the package being built. Note that each metadata key will be uppercased and - characters transformed to _ for the name of the environment variable.
  • If links is not present, then the command is unconditionally run with 0 command line arguments, otherwise:
  • The libraries that are requested via links are passed as command line arguments. The pre-built libraries in links (detailed above) will be filtered out and not passed to the build command. If there are no libraries to build (they're all pre-built), the build command will not be invoked.

Outputs

The responsibility of the build script is to ensure that all requested native libraries are available for the crate to compile. The conceptual output of the build script will be metadata on stdout explaining how the compilation went and whether it succeeded.

An example output of a build command would be:

cargo:rustc-flags=-l static:foo -L /path/to/foo
cargo:root=/path/to/foo
cargo:libdir=/path/to/foo/lib
cargo:include=/path/to/foo/include

Each line that begins with cargo: is interpreted as a line of metadata for Cargo to store. The remaining part of the line is of the form key=value (like environment variables).

This output is similar to the pre-built libraries section above in that most key/value pairs are opaque metadata except for the special rustc-flags key. The rustc-flags key indicates to Cargo necessary flags needed to link the libraries specified.

For rustc-flags specifically, Cargo will propagate all -L flags transitively to all dependencies, and -l flags to the package being built. All metadata will only be passed to immediate dependants. Note that this is recommending that #[link] is discouraged as it is not the source code's responsibility to dictate linkage.

If the build script exits with a nonzero exit code, then Cargo will consider it to have failed and will abort compilation.

Input/Output rationale

In general one of the purposes of a custom build command is to dynamically determine the necessary dependencies for a library. These dependencies may have been discovered through pkg-config, built locally, or even downloaded from a remote. This set can often change, and is the impetus for the rustc-flags metadata key. This key indicates what libraries should be linked (and how) along with where to find the libraries.

The remaining metadata flags are not as useful to rustc itself, but are quite useful to interdependencies among native packages themselves. For example libssh2 depends on OpenSSL on linux, which means it needs to find the corresponding libraries and header files. The metadata keys serve as a vector through which this information can be transmitted. The maintainer of the openssl-sys package (described below) would have a build script responsible for generating this sort of metadata so consumer packages can use it to build C libraries themselves.

A set of *-sys packages

This section will discuss a convention by which Cargo packages providing native dependencies will be named, it is not proposed to have Cargo enforce this convention via any means. These conventions are proposed to address constraints 5 and 6 above.

Common C dependencies will be refactored into a package named foo-sys where foo is the name of the C library that foo-sys will provide and link to. There are two key motivations behind this convention:

  • Each foo-sys package will declare its own dependencies on other foo-sys based packages
  • Dependencies on native libraries expressed through Cargo will be subject to version management, version locking, and deduplication as usual.

Each foo-sys package is responsible for providing the following:

  • Declarations of all symbols in a library. Essentially each foo-sys library is only a header file in terms of Rust-related code.
  • Ensuring that the native library foo is linked to the foo-sys crate. This guarantees that all exposed symbols are indeed linked into the crate.

Dependencies making use of *-sys packages will not expose extern blocks themselves, but rather use the symbols exposed in the foo-sys package directly. Additionally, packages using *-sys packages should not declare a #[link] directive to link to the native library as it's already linked to the *-sys package.

Phasing strategy

The modifications to the build command are breaking changes to Cargo. To ease the transition, the build command will be join'd to the root path of a crate, and if the file exists and ends with .rs, it will be compiled as describe above. Otherwise a warning will be printed and the fallback behavior will be executed.

The purpose of this is to help most build scripts today continue to work (but not necessarily all), and pave the way forward to implement the newer integration.

Case study: Cargo

Cargo has a surprisingly complex set of C dependencies, and this proposal has created an example repository for what the configuration of Cargo would look like with respect to its set of C dependencies.

Case study: generated code

As the release of Rust 1.0 comes closer, the use of compiler plugins has become increasingly worrying over time. It is likely that plugins will not be available by default in the stable and beta release channels of Rust. Many core Cargo packages in the ecosystem today, such as gl-rs and iron, depend on plugins to build. Others, like rust-http, are already using compile-time code generation with a build script (which this RFC will attempt to standardize on).

When taking a closer look at these crates' dependence on plugins it's discovered that the primary use case is generating Rust code at compile time. For gl-rs, this is done to bind a platform-specific and evolving API, and for rust-http this is done to make code more readable and easier to understand. In general generating code at compile time is quite a useful ability for other applications such as bindgen (C bindings), dom bindings (used in Servo), etc.

Cargo's and Rust's support for compile-time generated code is quite lacking today, and overhauling the build command provides a nice opportunity to rethink this sort of functionality.

With this motivation, this RFC proposes tweaking the include! macro to enable it to be suitable for the purpose of including generated code:

include!(concat!(env!("OUT_DIR"), "/generated.rs"));

Today this does not compile as the argument to include! must be a string literal. This RFC proposes tweaking the semantics of the include! macro to expand locally before testing for a string literal. This is similar to the behavior of the format_args! macro today.

Using this, Cargo crates will have OUT_DIR present for compilations, and any generated Rust code can be generated by the build command and placed into OUT_DIR. The include! macro would then be used to include the contents of the code inside of the appropriate module.

Case study: controlling linkage

One of the motivations for this RFC and redesign of the build command is to making linkage controls more explicit to Cargo itself rather than hardcoding particular linkages in source code. As proposed, however, this RFC does not bake any sort of dynamic-vs-static knowledge into Cargo itself.

This design area is intentionally left untouched by Cargo in order to reduce the number of moving parts and also in an effort to simplify build commands as much as possible. There are, however, a number of methods to control how libraries are linked:

  1. First and foremost is the ability to override libraries via Cargo configuration. Overridden native libraries are specified manually and override whatever the "default" would have been otherwise.
  2. Delegation to arbitrary code running in build scripts allow the possibility of specification through other means such as environment variables.
  3. Usage of common third-party build tools will allow for conventions about selecting linkage to develop over time.

Note that points 2 and 3 are intentionally vague as this RFC does not have a specific recommendation for how scripts or tooling should respect linkage. By relying on a common set of dependencies to find native libraries it is envisioned that the tools will grow a convention through which a linkage preference can be specified.

For example, a possible implementation of pkg-config will be discussed. This tool can be used as a first-line-defense to help locate a library on the system as well as its dependencies. If a crate requests that pkg-config find the library foo, then the pkg-config crate could inspect some environments variables for how it operates:

  • If FOO_NO_PKG_CONFIG is set, then pkg-config immediately returns an errors. This helps users who want to force pkg-config to not find a package or force the package to build a statically linked fallback.
  • If FOO_DYNAMIC is set, then pkg-config will only succeed if it finds a dynamic version of foo. A similar meaning could be applied to FOO_STATIC.
  • If PKG_CONFIG_ALL_DYNAMIC is set, then it will act as if the package foo is requested by be dynamic specifically (similarly for static linking).

Note that this is not a concrete design, this is just meant to be an example to show how a common third-party tool can develop a convention for controlling linkage not through Cargo itself.

Also note that this can mean that cargo itself may not succeed "by default" in all cases, or larger projects with more flavorful configurations may want to pursue more fine-tuned control over how libraries are linked. It is intended that cargo will itself be driven with something such as a Makefile to perform this configuration (be it environment or in files).

Drawbacks

  • The system proposed here for linking native code is in general somewhat verbose. In theory well designed third-party Cargo crates can alleviate this verbosity by providing much of the boilerplate, but it's unclear to what extent they'll be able to alleviate it.

  • None of the third-party crates with "convenient build logic" currently exist, and it will take time to build these solutions.

  • Platform specific dependencies mean that the entire package graph must always be downloaded, regardless of the platform.

  • In general dealing with linkage is quite complex, and the conventions/systems proposed here aren't exactly trivial and may be overkill for these purposes.

  • As can be seen in the example repository, platform dependencies are quite verbose and are difficult to work with when you actually want a negation instead of a positive platform to include.

  • Features themselves will also likely need to be platform-specific, but this runs into a number of tricky situations and needs to be fleshed out.

Alternatives

  • It has been proposed to support the links manifest key in the features section as well. In the proposed scheme you would have to create an optional dependency representing an optional native dependency, but this may be too burdensome for some cases.

  • The build command could instead take a script from an external package to run instead of a script inside of the package itself. The major drawback of this approach is that even the tiniest of build scripts require a full-blown package which needs to be uploaded to the registry and such. Due to the verboseness of so many packages, this was decided against.

  • Cargo remains fairly "dumb" with respect to how native libraries are linked, and it's always a possibility that Cargo could grow more first-class support for dealing with the linkage of C libraries.

Unresolved questions

None