Skip to content
This repository has been archived by the owner on Feb 28, 2023. It is now read-only.

Include well-known source input dependencies for Rust compilation #51

Merged
merged 1 commit into from
May 17, 2021

Conversation

Xanewok
Copy link
Contributor

@Xanewok Xanewok commented Apr 27, 2021

This is a somewhat minor hack to fix compilation in the distributed scenario, where a package needs to read its own manifest file.

For more context, Cargo already packs the manifest and other [package.include]d files in the .crate archive and in respective the .cargo/registry/src/<registry>/<package-id> directory, however sccache relies only on the explicit dep-info created by rustc for a regular crate compilation. While technically correct for a "barebones" Rust crate, the packages compiled and distributed by Cargo may also implicitly depend on the additional, aforementioned packed files.

This unconditionally marks Cargo.toml as the input for the Rust compilation, since
a) every Cargo package is known to have that file present in its crate archive/build environment
b) the manifest file is a relatively small text file so the overhead of packing it and transmitting it over the wire should be negligible
c) we don't use non-Cargo Rust crates (:warning: I need to test if it hard errors if there's not a Cargo.toml present for some crate)
d) this is 99% enough to get Substrate compiling

Alternatives

a) Use the include_bytes! with unnamed const trick:

  const _: &[u8] = include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/Cargo.toml"));

This works on stable but includes a file in the compilation output which may or may not be stripped; This would have to be repeated for every crate that uses parity-scale-codec (...that's too many)

b) Wait for rust-lang/rust#84029 to land and use it in the nightly toolchain setting. That's the most straightforward solution but we'd only support the nightly toolchain and the time-to-stabilise might prove to be too long if we want to use sccache using a Rust stable toolchain.

c) Remove the rename-detection feature in parity-scale-codec-derive (breaking change but not sure if include that in stability guarantees) and migrate every crate that uses parity-scale-codec to import it directly, without renaming to codec via Cargo.toml (or other names). This imposes a churn and there's not a reliable/elegant way to prevent someone from breaking the build as this issue only pops up when using sccache

I'm 50-50 whether we should upstream this patch; ideally the crate developers should use rust-lang/rust#84029 directly when it lands on stable but always including the Cargo.toml manifest seems like it gives some degree of out-of-box support for Cargo packages without that big of a runtime cost.

EDIT: This now also includes the src/websockets.js for libp2p-wasm-ext which is needed for Substrate compilation and in general introduces a hard-coded list which additional files should be included based on its crate name (as a reasonable heuristic). In the future we might move that off to a separate configurable knob in the client configuration file if that seems like a better place for this kind of feature.

@drahnr
Copy link
Contributor

drahnr commented Apr 28, 2021

tl;dr: yes, but we should make it an optional (opt-out) thing to include the cargo file until we have means to measure the impact of such a change. We need it for the time being until track_path! is stabilized. So 👍


This is a somewhat minor hack to fix compilation in the distributed scenario, where a package needs to read its own manifest file.

For more context, Cargo already packs the manifest and other [package.include]d files in the .crate archive and in respective the .cargo/registry/src/<registry>/<package-id> directory, however sccache relies only on the explicit dep-info created by rustc for a regular crate compilation. While technically correct for a "barebones" Rust crate, the packages compiled and distributed by Cargo may also implicitly depend on the additional, aforementioned packed files.

This unconditionally marks Cargo.toml as the input for the Rust compilation, since
a) every Cargo package is known to have that file present in its crate archive/build environment
b) the manifest file is a relatively small text file so the overhead of packing it and transmitting it over the wire should be negligible

I think we generally need a way to measure these things :) the impact is sometimes rather surprising

c) we don't use non-Cargo Rust crates (warning I need to test if it hard errors if there's not a Cargo.toml present for some crate)

We can't really assume this. There are codebases which do not use cargo, i.e. rust as part of C or C++ project.

d) this is 99% enough to get Substrate compiling

🎉

Alternatives

a) Use the include_bytes! with unnamed const trick:

  const _: &[u8] = include_bytes!(concat!(env!("CARGO_MANIFEST_DIR"), "/Cargo.toml"));

This works on stable but includes a file in the compilation output which may or may not be stripped; This would have to be repeated for every crate that uses parity-scale-codec (...that's too many)

Not entirly sure we'd have to repeat it, or if that hack would have to be part of the scale codec created code? Either way, this is a hack and I can see non-parity crates refusing to do such treachery.

b) Wait for rust-lang/rust#84029 to land and use it in the nightly toolchain setting. That's the most straightforward solution but we'd only support the nightly toolchain and the time-to-stabilise might prove to be too long if we want to use sccache using a Rust stable toolchain.

For stable we have to do something in the meantime, 💯

c) Remove the rename-detection feature in parity-scale-codec-derive (breaking change but not sure if include that in stability guarantees) and migrate every crate that uses parity-scale-codec to import it directly, without renaming to codec via Cargo.toml (or other names). This imposes a churn and there's not a reliable/elegant way to prevent someone from breaking the build as this issue only pops up when using sccache

The issue really is limited to renames combined with proc macro crates, right? So it should be rather limited I would assume (this is not a proof, prove me wrong!)?
Not a fan of changing source for the sake of a tool. So for me that's a nay

I'm 50-50 whether we should upstream this patch; ideally the crate developers should use rust-lang/rust#84029 directly when it lands on stable but always including the Cargo.toml manifest seems like it gives some degree of out-of-box support for Cargo packages without that big of a runtime cost.

I tried this before which seems quite similar mozilla/sccache#719 and it was closed, so I would not bother to do so.

Copy link
Contributor

@drahnr drahnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Xanewok
Copy link
Contributor Author

Xanewok commented Apr 28, 2021

Not entirely sure we'd have to repeat it, or if that hack would have to be part of the scale codec created code? Either way, this is a hack and I can see non-parity crates refusing to do such treachery.

Hm, that's a fair point, I guess you could technically quote! that into the consumer crate which uses the derive macros... That sounds like a good workaround and I guess it's technically correct in general to mark Cargo.toml as required file as it directly affects the compilation.

Unfortunately, I verified that using the unnamed const trick, the included file ends up in the .rlibs - whether it's in release or debug.

The issue really is limited to renames combined with proc macro crates, right?

Yes, the only reason in the first place for requiring Cargo.toml in our case is to detect renames via proc-macro-crate (which is used by parity-scale-codec-derive). Interestingly enough, this problem is not yet "solved" in Rust proper, see rust-lang/rust#54363. For example serde uses a special #[serde(crate = "...")] attribute to work around that.

@gww-parity
Copy link
Contributor

I wonder, taking a brief look, title says, about Cargo.toml, what about Cargo.lock ?

@Xanewok
Copy link
Contributor Author

Xanewok commented Apr 28, 2021

That's a fair question - in general, dependencies' lockfiles are ignored and only the lockfile of the primary package or the workspace is used. The lockfile is used as part of the dependency resolution, which happens on the client side - with dependencies resolved, the corresponding crate compilation requests are sent from the client to the build servers, so there's no need to transfer Cargo.lock anywhere for the build to happen/succeed.

@Xanewok
Copy link
Contributor Author

Xanewok commented May 5, 2021

thread 'compiler::rust::test::test_generate_hash_key' panicked at 'called `Result::unwrap()` on an `Err` value: Failed to open 
file for hashing: "/tmp/sccache_testvclnIp/Cargo.toml"

I guess this needs a sanity check beforehand if the file exists at all

@Xanewok Xanewok force-pushed the igor-add-cargo-toml branch 3 times, most recently from 152f078 to 27eadba Compare May 10, 2021 19:41
@Xanewok Xanewok requested a review from drahnr May 10, 2021 20:37
@Xanewok
Copy link
Contributor Author

Xanewok commented May 10, 2021

thread 'compiler::rust::test::test_generate_hash_key' panicked at 'called `Result::unwrap()` on an `Err` value: Failed to open 
file for hashing: "/tmp/sccache_testvclnIp/Cargo.toml"

I guess this needs a sanity check beforehand if the file exists at all

Fixed in 27eadba

@Xanewok Xanewok force-pushed the igor-add-cargo-toml branch from 27eadba to 8aaf775 Compare May 14, 2021 12:22
@Xanewok Xanewok changed the title Always include Cargo.toml as input for Rust compilation Include well-known source input dependencies for Rust compilation May 14, 2021
.map(|(_, value)| Path::new(value));
const KNOWN_AUX_DEP_FILES: &[(Option<&str>, &str)] = &[
(None, "Cargo.toml"),
(Some("libp2p_wasm_ext"), "src/websockets.js"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those part of the includes = [] set in the manifest? But on the other handside there might be other rather large files as well 🧐 - I guess it's good as is.
We might want to consider to allow easy modification via env var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be included in the packaged and uploaded .crate file to crates.io - otherwise, regular builds wouldn't be possible. The libp2p-wasm-ext has no include/exclude, so by default it includes everything in directory IIRC.
Here's a list of files in the package:

xanewok@faerun-dev:~/repos/rust-libp2p/transports/wasm-ext (master)$ cargo package --list
.cargo_vcs_info.json
CHANGELOG.md
Cargo.toml
Cargo.toml.orig
src/lib.rs
src/websockets.js

We might want to consider to allow easy modification via env var?

Do you mean unconditionally including Cargo.toml for every package?
We have to include src/websockets.js for libp2p-wasm-ext, otherwise we can't ever build it via sccache-dist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to be included in the packaged and uploaded .crate file to crates.io - otherwise, regular builds wouldn't be possible. The libp2p-wasm-ext has no include/exclude, so by default it includes everything in directory IIRC.
Here's a list of files in the package:

xanewok@faerun-dev:~/repos/rust-libp2p/transports/wasm-ext (master)$ cargo package --list
.cargo_vcs_info.json
CHANGELOG.md
Cargo.toml
Cargo.toml.orig
src/lib.rs
src/websockets.js

I think that status quo of inclusion logic is good as is :)

We might want to consider to allow easy modification via env var?

Do you mean unconditionally including Cargo.toml for every package?
We have to include src/websockets.js for libp2p-wasm-ext, otherwise we can't ever build it via sccache-dist.

I meant more on the extensability front/generalization i.e. what if another crate pops up in the need of file? We want to have rather quick turnaround, so a CACHEPOT_EXTRA=$crate:$resource_path,.. might be something for a follow up PR (#idea) and I have a feeling that even with the deps tracking this might be a quick fix before it's going to be adopted in the ecosystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that - yeah, that's definitely a good idea. I was thinking of introducing it in a configuration file but env var sounds easy/quicker for now 👍

@drahnr
Copy link
Contributor

drahnr commented May 17, 2021

@Xanewok all my comments can be addressed in a followup as needed, this is good for now and remedies a roadblock 👍

@Xanewok
Copy link
Contributor Author

Xanewok commented May 17, 2021

Awesome, thank you for your review! 🙏

@Xanewok Xanewok merged commit 7ede5c4 into master May 17, 2021
@Xanewok Xanewok deleted the igor-add-cargo-toml branch May 17, 2021 15:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants