Skip to content

Commit

Permalink
ISLE: guard against stale generated source in default build config.
Browse files Browse the repository at this point in the history
Currently, the `build.rs` script that generates Rust source from the
ISLE DSL will only do this generation if the `rebuild-isle` Cargo
feature is specified. By default, it is not. This is based on the
principle that we (the build script) do not modify the source tree as
managed by git; git-managed files are strictly a human-managed and
human-edited resource. By adding the opt-in Cargo feature, a developer
is requesting the build script to perform an explicit action. (In my
understanding at least, this principle comes from the general philosophy
of hermetic builds: the output should be a pure function of the input,
and part of this is that the input is read-only. If we modify the source
tree, then all bets are off.)

Unfortunately, requiring the opt-in feature also creates a footgun that
is easy to hit: if a developer modifies the ISLE DSL source, but forgets
to specify the Cargo feature, then the compiler will silently be built
successfully with stale source, and will silently exclude any changes
that were made.

The generated source is checked into git for a good reason: we want DSL
compiler to not affect build times for the overwhelmingly common case
that Cranelift is used as a dependency but the backends are not being
actively developed. (This overhead comes mainly from building `islec`
itself.)

So, what to do? This PR implements a middle ground first described in
[this conversation](bytecodealliance#3506 (comment)), in which we:

- Generate a hash (SHA-512) of the ISLE DSL source and produce a
  "manifest" of ISLE inputs alongside the generated source; and
- Always read the ISLE DSL source, and see if the manifest is still
  valid, on builds that do not have the opt-in "rebuild" feature.

This allows us to know whether the ISLE compiler output would have been
the same (modulo changes to the DSL compiler itself, which are
out-of-scope here), without actually building the ISLE compiler and
running it.

If the compiler-backend developer modifies an ISLE source file and then
tries to build `cranelift-codegen` without adding the `rebuild-isle`
Cargo feature, they get the following output:

```text
  Error: the ISLE source files that resulted in the generated Rust source

        * src/isa/x64/lower/isle/generated_code.rs

  have changed but the generated source was not rebuilt! These ISLE source
  files are:

         * src/clif.isle
         * src/prelude.isle
         * src/isa/x64/inst.isle
         * src/isa/x64/lower.isle

  Please add `--features rebuild-isle` to your `cargo build` command
  if you wish to rebuild the generated source, then include these changes
  in any git commits you make that include the changes to the ISLE.

  For example:

    $ cargo build -p cranelift-codegen --features rebuild-isle

  (This build script cannot do this for you by default because we cannot
  modify checked-into-git source without your explicit opt-in.)
```

which will tell them exactly what they need to do to fix the problem!

Note that I am leaving the "Rebuild ISLE" CI job alone for now, because
otherwise, we are trusting whomever submits a PR to generate the correct
generated source. In other words, the manifest is a communication from
the checked-in tree to the developer, but we still need to verify that
the checked-in generated Rust source and the manifest are correct with
respect to the checked-in ISLE source.
  • Loading branch information
cfallin committed Nov 16, 2021
1 parent 5d5629d commit a1b1e39
Show file tree
Hide file tree
Showing 4 changed files with 196 additions and 49 deletions.
24 changes: 8 additions & 16 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions cranelift/codegen/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ criterion = "0.3"
cranelift-codegen-meta = { path = "meta", version = "0.78.0" }
isle = { path = "../isle/isle", version = "0.78.0", optional = true }
miette = { version = "3", features = ["fancy"] }
sha2 = "0.9.8"

[features]
default = ["std", "unwind"]
Expand Down
216 changes: 183 additions & 33 deletions cranelift/codegen/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

use cranelift_codegen_meta as meta;

use sha2::{Digest, Sha512};
use std::env;
use std::io::Read;
use std::process;
Expand Down Expand Up @@ -89,6 +90,10 @@ fn main() {
std::process::abort();
}
}
#[cfg(not(feature = "rebuild-isle"))]
{
check_isle_manifests(crate_dir).expect("Failed to check ISLE manifests");
}

let pkg_version = env::var("CARGO_PKG_VERSION").unwrap();
let mut cmd = std::process::Command::new("git");
Expand Down Expand Up @@ -127,29 +132,42 @@ fn main() {
.unwrap();
}

/// Rebuild ISLE DSL source text into generated Rust code.
/// Strip the current directory from the file paths, because `islec`
/// includes them in the generated source, and this helps us maintain
/// deterministic builds that don't include those local file paths.
fn make_isle_source_path_relative(
cur_dir: &std::path::PathBuf,
filename: std::path::PathBuf,
) -> std::path::PathBuf {
if let Ok(suffix) = filename.strip_prefix(&cur_dir) {
suffix.to_path_buf()
} else {
filename
}
}

/// Construct the list of compilations (transformations from ISLE
/// source to generated Rust source) that exist in the repository.
///
/// NB: This must happen *after* the `cranelift-codegen-meta` functions, since
/// it consumes files generated by them.
#[cfg(feature = "rebuild-isle")]
fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::Error + 'static>> {
use std::sync::Once;
static SET_MIETTE_HOOK: Once = Once::new();
SET_MIETTE_HOOK.call_once(|| {
let _ = miette::set_hook(Box::new(|_| {
Box::new(
miette::MietteHandlerOpts::new()
// This is necessary for `miette` to properly display errors
// until https://github.com/zkat/miette/issues/93 is fixed.
.force_graphical(true)
.build(),
)
}));
});
/// This list is used either to regenerate the Rust source in-tree (if
/// the `rebuild-isle` feature is enabled), or to verify that the ISLE
/// source in-tree corresponds to the ISLE source that was last used
/// to rebuild the Rust source (if the `rebuild-isle` feature is not
/// enabled).
///
/// The returned value is a vector of (output Rust code file, input
/// ISLE source files) tuples.
fn get_isle_compilations(
crate_dir: &std::path::Path,
) -> Result<Vec<(std::path::PathBuf, Vec<std::path::PathBuf>)>, std::io::Error> {
let cur_dir = std::env::current_dir()?;

let clif_isle = crate_dir.join("src").join("clif.isle");
let prelude_isle = crate_dir.join("src").join("prelude.isle");
let src_isa_x64 = crate_dir.join("src").join("isa").join("x64");
let clif_isle =
make_isle_source_path_relative(&cur_dir, crate_dir.join("src").join("clif.isle"));
let prelude_isle =
make_isle_source_path_relative(&cur_dir, crate_dir.join("src").join("prelude.isle"));
let src_isa_x64 =
make_isle_source_path_relative(&cur_dir, crate_dir.join("src").join("isa").join("x64"));

// This is a set of ISLE compilation units.
//
Expand All @@ -160,7 +178,7 @@ fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::E
// There should be one entry for each backend that uses ISLE for lowering,
// and if/when we replace our peephole optimization passes with ISLE, there
// should be an entry for each of those as well.
let isle_compilations = vec![
Ok(vec![
// The x86-64 instruction selector.
(
src_isa_x64
Expand All @@ -174,19 +192,73 @@ fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::E
src_isa_x64.join("lower.isle"),
],
),
];
])
}

let cur_dir = std::env::current_dir()?;
for (out_file, mut files) in isle_compilations {
for file in files.iter_mut() {
println!("cargo:rerun-if-changed={}", file.display());
/// Compute the content of the source manifest for all ISLE source
/// files that go into the compilation of one Rust file.
///
/// We store this alongside the `<generated_filename>.rs` file as
/// `<generated_filename>.manifest` and use it to verify that a
/// rebuild was done if necessary.
fn compute_isle_source_manifest(
source: &[std::path::PathBuf],
) -> Result<String, Box<dyn std::error::Error + 'static>> {
use std::fmt::Write;

let mut manifest = String::new();

for filename in source {
let content = std::fs::read(filename)?;
let mut hasher = Sha512::default();
hasher.update(content);
writeln!(
&mut manifest,
"{} {:x}",
filename.display(),
hasher.finalize()
)?;
}

// Strip the current directory from the file paths, because `islec`
// includes them in the generated source, and this helps us maintain
// deterministic builds that don't include those local file paths.
if let Ok(suffix) = file.strip_prefix(&cur_dir) {
*file = suffix.to_path_buf();
}
Ok(manifest)
}

/// Compute the manifest filename for the given generated Rust file.
fn manifest_filename(rust_target: &std::path::Path) -> std::path::PathBuf {
rust_target.with_extension("manifest")
}

/// Rebuild ISLE DSL source text into generated Rust code.
///
/// NB: This must happen *after* the `cranelift-codegen-meta` functions, since
/// it consumes files generated by them.
#[cfg(feature = "rebuild-isle")]
fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::Error + 'static>> {
use std::sync::Once;
static SET_MIETTE_HOOK: Once = Once::new();
SET_MIETTE_HOOK.call_once(|| {
let _ = miette::set_hook(Box::new(|_| {
Box::new(
miette::MietteHandlerOpts::new()
// This is necessary for `miette` to properly display errors
// until https://github.com/zkat/miette/issues/93 is fixed.
.force_graphical(true)
.build(),
)
}));
});

let isle_compilations = get_isle_compilations(crate_dir)?;

for (out_file, files) in &isle_compilations {
// First, remove the manifest, if any; we will recreate it
// below if the compilation is successful. Ignore error if no
// manifest was present.
let manifest = manifest_filename(out_file);
let _ = std::fs::remove_file(manifest);

for file in files {
println!("cargo:rerun-if-changed={}", file.display());
}

let code = (|| {
Expand Down Expand Up @@ -230,6 +302,17 @@ fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::E
std::fs::write(out_file, code)?;
}

// Write the manifest so that, in the default build configuration
// without the `rebuild-isle` feature, we can at least verify that
// no changes were made that will not be picked up. Note that we
// only write this *after* we write the source above, so no
// manifest is produced if there was an error.
for (out_file, files) in &isle_compilations {
let manifest = compute_isle_source_manifest(&files[..])?;
let filename = manifest_filename(out_file);
std::fs::write(filename, &manifest[..])?;
}

return Ok(());

fn rustfmt(code: &str) -> std::io::Result<String> {
Expand Down Expand Up @@ -259,3 +342,70 @@ fn rebuild_isle(crate_dir: &std::path::Path) -> Result<(), Box<dyn std::error::E
Ok(String::from_utf8(data).expect("rustfmt always writs utf-8 to stdout"))
}
}

/// Check that the generated source, which we are *not* rebuilding in
/// this configuration (the default) and instead relying on the
/// checked-into-git generated copy, still corresponds to the ISLE
/// source. We do this by computing a hash of the ISLE source and
/// checking it against a "manifest" that is also checked into git,
/// alongside the generated Rust. If there is a mismatch, we signal a
/// build error and suggest to the user that they might wish to build
/// with the `rebuild-isle` feature.
///
/// (Why not include this feature by default? Because the build
/// process must not modify the checked-in source by default; any
/// checked-in source is a human-managed bit of data, and we can only
/// act as an agent of the human developer when explicitly requested
/// to do so. This manifest check is a middle ground that ensures this
/// explicit control while also avoiding the easy footgun of "I
/// changed the ISLE, why isn't the compiler updated?!".)
#[cfg(not(feature = "rebuild-isle"))]
fn check_isle_manifests(
crate_dir: &std::path::Path,
) -> Result<(), Box<dyn std::error::Error + 'static>> {
let isle_compilations = get_isle_compilations(crate_dir)?;

let mut had_error = false;
for (out_file, files) in &isle_compilations {
for file in files {
println!("cargo:rerun-if-changed={}", file.display());
}

let manifest = std::fs::read_to_string(manifest_filename(out_file))?;
let expected_manifest = compute_isle_source_manifest(&files[..])?;
if manifest != expected_manifest {
had_error = true;
eprintln!("");
eprintln!("Error: the ISLE source files that resulted in the generated Rust source");
eprintln!("");
eprintln!(" * {}", out_file.display());
eprintln!("");
eprintln!(
"have changed but the generated source was not rebuilt! These ISLE source"
);
eprintln!("files are:");
eprintln!("");
for file in files {
eprintln!(" * {}", file.display());
}
}
}

if had_error {
eprintln!("");
eprintln!("Please add `--features rebuild-isle` to your `cargo build` command");
eprintln!("if you wish to rebuild the generated source, then include these changes");
eprintln!("in any git commits you make that include the changes to the ISLE.");
eprintln!("");
eprintln!("For example:");
eprintln!("");
eprintln!(" $ cargo build -p cranelift-codegen --features rebuild-isle");
eprintln!("");
eprintln!("(This build script cannot do this for you by default because we cannot");
eprintln!("modify checked-into-git source without your explicit opt-in.)");
eprintln!("");
std::process::abort();
}

Ok(())
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
src/clif.isle 5ce4617f311dd072ffe5d565ed48a7ce13ec8cbabe453809f15a76021e157d0e277c0d6da87f249fc1d6d2ed28669e5d1a8d62eb84cf27dad3c8bc58a5ce6b42
src/prelude.isle 01a60fc84c97efeff1b0936180127f636a68624e24a5dcb952f0ed623b005967f5a259ee0571cd70f6f6ea2c6f8de4713181418b8d30408ff4e0c963c86d52b2
src/isa/x64/inst.isle a782a88f21a20640e6d27c9d2465ba1d0b6b83a0b35a61d28fb0e33c159cfddaf9a6db870d769b770390284257555f754a7ccb9ddf30929892a6bc946407a79d
src/isa/x64/lower.isle 8477b4437b01f5b1eaad037d315f3c542e8f66ec89b5506044686965c23e32722a2592cf7d3e213d7543bc2df3532f4ae5c83260025c720ceee06ea4fe272b72

0 comments on commit a1b1e39

Please sign in to comment.