Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible outputs without build-environment-specific command lines #87325

Closed
danakj opened this issue Jul 20, 2021 · 18 comments
Closed

Reproducible outputs without build-environment-specific command lines #87325

danakj opened this issue Jul 20, 2021 · 18 comments
Labels
A-reproducibility Area: Reproducible / deterministic builds C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@danakj
Copy link
Contributor

danakj commented Jul 20, 2021

Hello,

We in Chromium are working on integrating Rust into our distributed build system. We really appreciate the work done here, as reproducible builds are a requirement for such a plan. However, while this allows the binary output of rustc to be reproducible, we have noticed that the (fully resolved) command-line itself ends up not being so. This destroys our ability to cache build objects between bots and/or developers, as any change in the command-line requires a recompilation (such as changing optimization flags).

Clang has resolved this by providing a -fdebug-compilation-dir flag, in addition to the equivalent of rustc's --remap-path-prefix (as -fdebug-prefix-map in Clang).

This problem is discussed, in the context of Clang, in this blog post: https://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html

We would like to propose a similar flag for rustc. For simplicity we'd propose similar naming and behaviour as Clang's, in order to reuse previous work.

rustc --debug-compilation-dir . would be exactly equivalent to rustc --remap-path-prefix $(pwd)=.

In terms of ordering and presidence, it can follow the exact same rules as multiple --remap-path-prefix arguments.

I originally posted this in issue #38322 and moving to a separate issue here.

@jyn514
Copy link
Member

jyn514 commented Jul 20, 2021

However, while this allows the binary output of rustc to be reproducible, we have noticed that the (fully resolved) command-line itself ends up not being so. This destroys our ability to cache build objects between bots and/or developers, as any change in the command-line requires a recompilation (such as changing optimization flags).

Do you mean that rustc is invalidating its own cache? Or that you have some build system around rustc that has caching based on the CLI args? The first problem should have been fixed in #84233. The second I'm not sure should belong in rustc - couldn't you change your build system not to expand the shell command before doing the caching?

@jyn514
Copy link
Member

jyn514 commented Jul 20, 2021

See also rust-lang/rfcs#3127, I'm not sure if that would help or not.

@danakj
Copy link
Contributor Author

danakj commented Jul 21, 2021

The Chromium build system consists of
GN - this generates the set of inputs and outputs and the command lines to get from on to the other
Ninja - this distributes the set of actions to run across cores to build in a highly efficient parallel way
Goma - this proxies the command line to a remote compiler on a bot which caches the output. If the cache is valid you skip compiling that object

One of the inputs to any cache system must be the command line. If you change the command line flags in any way, the same binary can’t be returned. And the command line must be expanded to know what would be passed to the compiler. However rustc forces you to choose between:

  • build determinism but you have a bot/user-specific command line
  • universally same command line but lose build determinism in debug builds

We can not use rustc with goma without resolving this conflict of choice that rustc currently presents.

Thus, we are asking for rustc to provide a means to produce deterministic builds without a different command line for each build environment.

While its semi-possible to solve this in the cacheing layer, it would require the cache to understand perfectly the command line arguments of rustc so it can know if a given command line will produce the same output. This would be flaky and unreliable, so we would like to solve this in the same way that Clang has, by providing a flag to remap the current working directory specifically.

rust-lang/rfcs#3127 looks to be in a similar spirit but we do not use cargo, and it’s the ultimate invocation of rustc that matters for a distributed compilation system, and that is determined by the rustc compiler flags.

Thanks for asking!

@jyn514
Copy link
Member

jyn514 commented Jul 21, 2021

While its semi-possible to solve this in the cacheing layer, it would require the cache to understand perfectly the command line arguments of rustc so it can know if a given command line will produce the same output.

What's wrong with @jsgf's suggestion?

if you wrap rustc in a shell script and put the --remap-path-prefix $(cwd)=. there then it would solve this problem.

@jsgf
Copy link
Contributor

jsgf commented Jul 21, 2021

Goma - this proxies the command line to a remote compiler on a bot which caches the output. If the cache is valid you skip compiling that object

Unless Goma guarantees the same cwd on every host you need to expand $(pwd) late anyway. If that's the case you need to either have this PR, or more generally a wrapper script.

How does this build system handle env vars which may contain a path (eg include!(env!("SOMEVAR")))?

@danakj
Copy link
Contributor Author

danakj commented Jul 21, 2021

A wrapper script might work on some platforms, but it won't work on our Windows builders, as they do not spawn sub-processes, and doing so would be a prohibitive performance cost.

I am going to open an MFP for this to discuss further, as @michaelwoerister pointed me to, and I will add more about this there. Thanks for discussing with me.

@danakj
Copy link
Contributor Author

danakj commented Jul 21, 2021

How does this build system handle env vars which may contain a path (eg include!(env!("SOMEVAR")))?

Thanks for bringing this to my attention. The answer will likely be that we will ban the use of env!() in first-party code, and restrict the use of third-party crates that use it unconditionally. We have bots that would probably catch simple cases of this, but it could have snuck in somehow. I've been looking for ways that non-determinism could be introduced in a Rust build, and this is a good one!

@jsgf
Copy link
Contributor

jsgf commented Jul 22, 2021

A wrapper script might work on some platforms, but it won't work on our Windows builders, as they do not spawn sub-processes, and doing so would be a prohibitive performance cost.

I'm not sure I understand what you mean by "they do not spawn sub-processes" - isn't running rustc spawning a subprocess? If the "rustc" it's invoking is a wrapper, how would it know? And why would it be a prohibitive performance cost? If the cost of invoking rustc takes time T, then surely the invocation cost of a wrapper isn't going to be much more than 2T? And given that Rust compilation is famously not all that fast, the invocation overhead is pretty small vs the total runtime.

and restrict the use of third-party crates that use it unconditionally

This could be tricky, depending on how wide and deep your third-party dep graph is. It isn't common in the absolute sense, but lots of dep subgraphs contain some crate which does something like this.

Speaking of which, how are you producing build rules for third-party crates? Are you hand writing them, or using some tool to compute them from Cargo info? How do you deal with build scripts?

(I have a lot of practical experience in building Rust with Buck in a large codebase, so I understand a lot of the problems you're facing.)

@jsgf
Copy link
Contributor

jsgf commented Jul 22, 2021

We have bots that would probably catch simple cases of this, but it could have snuck in somehow. I've been looking for ways that non-determinism could be introduced in a Rust build, and this is a good one!

BTW I have a stalled draft RFC which proposes making the env entirely controllable from the command-line - that is, the "real" process env is ignored, and replaced with command-line options which set the effective env for env!() etc. rust-lang/rfcs#2794

I'd be interested to know if this solves any problems for you (or what what changes would be needed to make it helpful).

@danakj
Copy link
Contributor Author

danakj commented Jul 22, 2021

Let me come back to some of this in the MCP, which I am working on still. Sorry for the delay, I'd like it to be as robust as possible.

Speaking of which, how are you producing build rules for third-party crates? Are you hand writing them, or using some tool to compute them from Cargo info? How do you deal with build scripts?

https://gn.googlesource.com/cargo-gnaw/ is a tool for consuming cargo and generating GN rules.

BTW I have a stalled draft RFC which proposes making the env entirely controllable from the command-line - that is, the "real" process env is ignored, and replaced with command-line options which set the effective env for env!() etc. rust-lang/rfcs#2794

This is awesome, and might allow us to make more third-party crates possible to use. I've shared it to the relevant folks internally as well to get more eyes toward it. We're super early-days and just trying to make C++ and Rust compile together on top of our infrastructure right now, so it might take some time to come back to that. But thank you for the link!

@fangism
Copy link

fangism commented Jul 26, 2021

Regarding --remap-path-prefix, we were able to remove that from all of our rustc invocations and require that paths be relative to the build dir, which makes the remote commands cache-able. We continue to replace uses of absolute paths in our build.

@jsgf
Copy link
Contributor

jsgf commented Jul 26, 2021

@fangism Abs paths can be hard to eliminate. One case is where someone is doing include!(env!("VAR")). In this case VAR is interpreted relative to the file that's doing the including, but if you don't know precisely where that is, or it's from multiple files in different directories, then the most robust thing to do is make VAR an abs path.

@fangism
Copy link

fangism commented Jul 27, 2021

We also changed uses of include! and env! variables to be path-relative as well. So far this has not been a problem, as sources remain in stable locations.

@the8472
Copy link
Member

the8472 commented Jul 27, 2021

rustc --debug-compilation-dir . would be exactly equivalent to rustc --remap-path-prefix $(pwd)=.

Would absolutizing relative paths on the left side of remap-path-prefix be an option? Then you could do --remap-path-prefix .=. and it would substitute whatever . resolves to with .

@jonas-schievink jonas-schievink added A-reproducibility Area: Reproducible / deterministic builds C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 27, 2021
@danakj
Copy link
Contributor Author

danakj commented Jul 27, 2021

Yep that would be another way to approach it. If all paths are absolute before remapping (I think so?) then it would be okay as it wouldn't get in the way of remapping relative paths (since they wouldn't exist at remap time). Though I should note it seems a little unclear as .=. looks like it wouldn't transform anything from reading it. Naively you'd expect either both . to be resolved to an absolute path or neither. This is similar in spirit to the %CWD% idea in the MCP.

@the8472
Copy link
Member

the8472 commented Jul 27, 2021

This is similar in spirit to the %CWD% idea in the MCP.

Except that it uses existing path resolution semantics instead of making some previously valid filenames reserved and magical.

@jsgf
Copy link
Contributor

jsgf commented Jul 27, 2021

Would absolutizing relative paths on the left side...

I guess that could work if you define that precisely as "map (effective) leading ./ to current working dir". In general canonicalizing (or even absoluting) paths can get tricky in the presence of symlinks, bind mounts, and (I presume) whatever mechanisms Windows has. If you can have multiple textually distinct pathnames which all refer to the current working dir via different aliases, and you want a remap rule to make them relative, then it can get pretty awkward.

Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 14, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
Manishearth added a commit to Manishearth/rust that referenced this issue Sep 15, 2021
…haelwoerister

Introduce -Z remap-cwd-prefix switch

This switch remaps any absolute paths rooted under the current
working directory to a new value. This includes remapping the
debug info in `DW_AT_comp_dir` and `DW_AT_decl_file`.

Importantly, this flag does not require passing the current working
directory to the compiler, such that the command line can be
run on any machine (with the same input files) and produce the
same results. This is critical property for debugging compiler
issues that crop up on remote machines.

This is based on adetaylor's rust-lang@dbc4ae7

Major Change Proposal: rust-lang/compiler-team#450
Discussed on rust-lang#38322. Would resolve issue rust-lang#87325.
@ehuss
Copy link
Contributor

ehuss commented Oct 1, 2021

Now that this is implemented, closing in favor of tracking issue #89434.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-reproducibility Area: Reproducible / deterministic builds C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants