-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Set CARGO_CHECK environment variable when type checking #3748
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW,
- The corresponding Cargo issue thread: Inform build scripts whether cargo is
check
ing orbuild
ing cargo#4001. - A recent summary of that thread: Inform build scripts whether cargo is
check
ing orbuild
ing cargo#4001 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the links! I searched Rust issues and RFCs but forgot to search the cargo repo
It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?
The Rust landscape has changed significantly since the last major activity in the discussion and rust-analyzer seems to be the de-facto LSP implementation at this point. What are the downsides of introducing an LSP specific check
command whose only difference (for now) is setting that environment variable? Or the alternative, allowing callers to set a CARGO_NO_BUILD
environment variable themselves , one that is officially sanctioned and supported by Rust/Cargo (if not set automatically)?
I understand and appreciate the hesitance to stabilize contracts without ironing out all of the generalities but the discussion about cargo modes and the cargo check && cargo build
feels very ivory tower. Rust is increasingly being used to integrate with C++ code beyond *-sys
crates as cxx and other tools mature, and I feel like a way to notify build scripts not to run extraneous steps is very much needed regardless of the aforementioned issues. Personally I always configure rust-analyzer
to use a subdirectory so that it doesn't block cargo build
anyway, so build caching between cargo check
and cargo build
wouldn't apply (and I'm curious what fraction of the community does too)
That said, I'm biased as I feel acute pain with cxx/cxx-qt, where sccache doesn't seem to help. Worst case scenario I can set the environment variables myself and use a custom fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?
No. While RFC doesn't really force anything until accepted, it is good to open a discussion. Sometimes people hang out in https://internals.rust-lang.org/ first for pre-RFC, before preparing a more formal proposal here.
|
||
2. Do Nothing: If we do nothing, build scripts will continue to run all build steps even when it's not necessary, significantly impacting Rust ergonomics when interfacing with exernal languages. | ||
|
||
# Prior art |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind providing link to each prior art so we can find reference easier?
- Projects using Protocol Buffers generate Rust code from .proto files | ||
- bindgen generates Rust bindings from C/C++ headers | ||
|
||
Currently, every time rust-analyzer runs `cargo check`, all build scripts must execute their full build process, including steps like compiling C++ code that are only needed for linking but not for type checking. This significantly impacts IDE responsiveness, especially in projects with complex build scripts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems missing from this description. In my understanding of how to use Cargo, a well-behaved build script should produce an appropriate set of rerun-if-changed
directives, meaning that the build script will not be run “every time” unless its dependencies are actually changed.
There are of course specific scenarios where it will be rerun regardless, such as if you are editing one of the build-dependencies
, but this paragraph suggests that build scripts predictably run on every cargo check
in every project that has one, not just the first cargo check
or cargo build
. In my mind, if that’s happening, that indicates a bug in the build script. (Or at least an incompleteness — rerunning every time is the default behavior of Cargo given zero rerun-if-*
directives, but for any build script that runs long enough to notice, that should be addressed by adding those directives.)
Can you clarify this? Is the build script in this scenario unable to use rerun-if
directives for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I miswrote that paragraph. I changed it to Currently, every time rust-analyzer runs
cargo check, the build script in the changed crate must execute its full build process, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that’s still not true in general. If a build script reruns, then one of the following situations applies:
- The build script does not have any
rerun-if-*
dependencies declared; that can and should be fixed. - You modified one of its
[build-dependencies]
. - You modified a file that is a
rerun-if-changed
dependency of the build script, so it should rerun. - You (implicitly or explicitly) changed a environment variable the build script depends on with
rerun-if-env-changed
, e.g. by runningcargo check
from two different environments (IDE and separate terminal). Again, this is an actual change in a dependency, though possibly an unintended one.
Changing the Rust code of a crate does not guarantee that its build script will rerun. Build scripts are rerun only if one of their dependencies changes, or if they failed to declare dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean now.
The cxx
crate parses Rust source files to generate bindings from cxx::bridge
macros and the Rust source files get added via cargo::rerun-if-changed
. Normally this wouldn't be a problem since C/C++ bindings are rarely changed, but in cxx-qt
projects the bridge code is the interface between QML/JS GUI code and the Rust backend so it receives a lot more attention while incurring additional QT build tooling overhead.
I'll add a paragraph explaining that detail. I didn't realize the issue I was experiencing is so specific to cxx
though
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Cargo will set the `CARGO_CHECK` environment variable to `true` when running `cargo check` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is unwise to name and describe this as for cargo check
in particular, for two reasons:
- A future-compatibility hazard, especially in context of extending this to other modes as noted in the future possibilities section — are build scripts going to have to check for
CARGO_CHECK || CARGO_DOC || CARGO_CLIPPY
to be efficient? I think we’d see future arguments over whether commands are “check-like enough to count” rather than introducing more, which would spread confusion and create a tradeoff we can just avoid from the start. - Build script authors might decide to intentionally do non-additive things based on the command in use (like generating different bindings code that’s more development-friendly in some way), leading to incorrect caching.
I think it would be better to name and describe as something like:
Cargo will set the
CARGO_WILL_NOT_LINK
environment variable if Cargo’s build plan is known not to include any linking (and therefore is known not to depend on any foreign object files the build script might produce, or any of itscargo::rustc-link-*
directives). WheneverCARGO_WILL_NOT_LINK
is present, you may skip producing object files. All generated Rust code or other inputs to the Rust compilation (such as files forinclude_str!
orcargo::rustc-cfg
directives) must be unaffected.The value of
CARGO_WILL_NOT_LINK
is reserved for making finer distinctions in the future; currently, it is always set to"1"
, but build scripts should check only for its presence.
This does change the behavior because the definition includes all of cargo check
, cargo fix
, cargo doc
, and cargo clippy
, but I think that’s the right thing to do. It’s also very specific about what the build script may do and should not do, to ensure that the results of the build script can still validly be cached.
(I specified the value to be "1"
and not "true"
because that's what Cargo uses for feature flags, the other existing example of “boolean” environment variables.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great points!
I think your alternative is a better approach, especially not to step on the toes of the parallel cargo modes discussion. I'll wait for more input and then probably update the RFC to the alternative method after the holidays
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a handful of environment variables that don't have the CARGO_
prefix, e.g. OUT_DIR
TARGET
NUM_JOBS
OPT_LEVEL
. My read is that these unprefixed variables are loosely meant to be portable to other build systems that might want to reuse build scripts. Since the proposed feature is certainly something that other build systems could make use of, maybe it should also be unprefixed?
A name like CHECK_ONLY
would be reasonably intuitive. WILL_NOT_LINK
or SKIP_CODEGEN
are a bit more technical but maybe more accurate with e.g. rustdoc
. NO_CODEGEN
is consistent with the unstable rustc -Zno-codegen
. I think Cargo currently gets the check-only functionality by passing --emit=dep-info,metadata
.
List of the existing environment variables for reference https://doc.rust-lang.org/cargo/reference/environment-variables.html
generate_rust_bindings(); | ||
|
||
// Only compile external code when not type checking | ||
if std::env::var("CARGO_CHECK").unwrap() != "true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text below says “The environment variable will not be set…” but this code panics if the environment variable is not set.
if std::env::var("CARGO_CHECK").unwrap() != "true" { | |
if std::env::var("CARGO_CHECK").is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thank you
Is there a reason why you propose an environment variable instead of just, a You would just have to ensure that this |
Wouldn't this be about the same since Cargo needs to build all One advantage of using an environment variable is that you can compile the build script once and use it for both checking and codegen. |
- `cargo run` | ||
- `cargo test` | ||
|
||
# Drawbacks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #3748 (comment): as proposed, this is only beneficial in cases where whatever the build script does is more significant than compiling the build script + deps itself. IME this is indeed the common case since often a crate like cmake
or bindgen
is needed to generate the headers, actually compiling only adds cmake --build
.
|
||
This feature primarily benefits library authors who maintain build scripts, especially those working with external code generation and compilation. Regular Rust developers using these libraries will automatically benefit from improved IDE performance without needing to modify their code. | ||
|
||
# Reference-level explanation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should address things like caching and output generation. Quoting @alexcrichton in rust-lang/cargo#4001 (comment):
I'm not 100% certain this still fits in Cargo myself. The major downside of implementing a feature like this is that cargo check && cargo build gets slower than it currently is. Cargo currently caches build script invocations for those two commands, which means that all the work done by cargo check is reused by cargo build and isn't redone.
[...]
As an author of crates like cc and cmake it's also somewhat ambiguous to me about what "check mode" would do for libraries like that. Should they do nothing? Type-check the C code? Make sure it all compiles? Basically I don't think that C/C++ have any real meaningful distinction like Rust does for cargo check and cargo build, so there's not really an obvious choice of what these library crates would do, which would require even more opt-in or configuration on behalf of all users.
High level proposal: for a build script, CHECK_ONLY
should mean:
- Do any relevant checks that don't include codegen, if possible
- Do what is necessary to check the Rust code
- This can always be disregarded in favor of running the full build script
So for C code, CHECK_ONLY
would mean doing the C/++ configure (but not build) step and possibly invoking bindgen. Usually this is ./configure
or cmake
without --build
to verify dependencies and environment, and prepare final headers and source files. bindgen
can then be run on these final headers. !CHECK_ONLY
then means that make
, cmake --build
, or cc
should be run.
OUT_DIR
should be the same between check and build so configuration output can be reused. The build script will probably always run the congigure step, cmake
knows how to skip a lot if files are already up to date 1.
For crates, cc
should totally ignore this environment variable and not change anything - the build script author will just be have the option to skip invoking cc
if CHECK_ONLY
is set. cmake
's Config
should add new methods like only_configure
(always called) and only_build
(called if CHECK_ONLY
is unset), since Config::build
currently does both these things. cmake::build
and Config::build
could optionally listen to CHECK_ONLY
to determine whether to run the only_build
step.
Footnotes
-
Cmake's configure caching can sometimes be pretty slow for bigger projects. It would be possible for build scripts to thumbprint relevant environment and skip invoking
cmake
ifOUT_DIR
has an updated thumbprint, but this would be advanced usage. ↩
|
||
3. **Maintenance Burden**: The Rust and Cargo teams will need to maintain this feature and ensure it remains consistent across different commands and contexts. | ||
|
||
# Rationale and alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rust-lang/cargo#10126 proposed CARGO_MODE
that adds doc
, test
, doctest
, and a handful of others. Personally I would rather have a single environment variable that says whether or not codegen should happen (as is proposed here) rather than needing to know which modes do and don't require codegen, which isn't robust against adding new modes. Somehow indicating bench/test mode could be useful but this seems better as an independent feature.
That's fair, although I guess that my thought process was that dependencies don't change often and thus can have their builds cached, whereas build scripts would essentially need to be built and run no matter what. Since the RFC mentions rust-analyzer specifically, I'd imagine that the primary case is speeding up type checking between changes of the code when dependencies would already be built, but I guess that also changing the build script would be a potential use case. I guess it is one of those weird cases where compiling two copies only is a big deal if you need to compile both copies often, whereas here you'd likely only be building one or the other. |
A Right now, it is possible for |
That's still possible with it as an environment variable - you could just set a cfg in the build script. That may be obvious, but I do feel like any way for conditional compilation to tell the difference between check and build is probably inoptimal - my current intuition is that check is just build-but-cheaper, and this could change it to something more like 2 different build profiles. |
generate_rust_bindings(); | ||
|
||
// Only compile external code when not type checking | ||
if std::env::var("CARGO_CHECK").is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be is_err
/`!is_ok'?
Add a new environment variable
CARGO_CHECK
that is set totrue
when runningcargo check
so build scripts can skip expensive compilation steps that are unnecessary for Rust type checking, such as compiling external C++ code in cxx based projects.Rendered