Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite miri script in rust #2909

Merged
merged 3 commits into from
Jul 31, 2023

Conversation

osiewicz
Copy link
Contributor

@osiewicz osiewicz commented May 31, 2023

This is a sketch of a rewrite of miri script in Rust. It does not include changes made in #2908 yet. Environment variables are not properly propagated yet, which is something I plan to address.

This PR is mostly a heads-up about the ongoing effort and it's state.
It's definitely not the cleanest code I've seen in my life, but my first goal was feature/interface parity. I will iterate on it a bit before marking it as ready.

I wonder though how this should be integrated/tested. Are you aware of anyone using ./miri in their scripts?
I guess we should keep existing miri script in place and let it run miri-script package directly?

CI should probably cargo check this package as well.

Fixes #2883

@osiewicz osiewicz marked this pull request as draft May 31, 2023 19:18
@RalfJung
Copy link
Member

RalfJung commented Jun 8, 2023

Thanks for the PR! Someone will take a look and review this eventually, we're just all busy currently -- sorry for the delay.

@osiewicz
Copy link
Contributor Author

osiewicz commented Jun 8, 2023

Hey,
No problem - that's totally understandable. I intend to polish this before marking this PR as ready for review. Sadly I couldn't find time for that (or anything, really) recently.

Miri script works as is, we don't need to rush this in.

@saethlin
Copy link
Member

saethlin commented Jun 9, 2023

I wonder though how this should be integrated/tested.

FWIW I'm not particularly worried about testing this code. Obviously it would be nice if it had good tests, but a lot of this is mucking about with the state of "the system", and the parts of this that really matter will get exercised regularly by us. Write tests if they help you build confidence that this works. But I think it would be easy to go overboard and spend more effort on the test suite than the tool.

Are you aware of anyone using ./miri in their scripts?

I don't think this is particularly worth worrying about.

I guess we should keep existing miri script in place and let it run miri-script package directly?

Yes. ./miri is much shorter than cargo run --manifest-path=miri-script/Cargo.toml -- . It would be neat if the script were turned into the thinnest wrapper around that. User workflows could be unaffected, the CI setup could stay as-is, and so on.

authors = ["Miri Team"]
description = "Helpers for miri maintenance"
license = "MIT OR Apache-2.0"
name = "miri"
Copy link
Member

@saethlin saethlin Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone cargo installs this, unless they have a creative setup, this will replace the rustup proxy miri executable and break their install (this situation is not trivial to recover from and really frustrates people). The bin target here should be called something else, or otherwise prevent that situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point, I'll rename binary to miri-script then.

Install {
/// Flags that are passed through to `cargo install`.
#[arg(trailing_var_arg = true)]
flags: Vec<String>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is String the right type to use for a passed-through CLI argument? If the arguments are literally just passed through, I think OsString is better.

.filter(|val| !val.is_empty())
.unwrap_or_else(|| {
let target_dir = path!(ret.miri_dir / "target");
target_dir.to_string_lossy().to_string()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on all the use of to_string_lossy as I had on pass-through CLI arguments. Perhaps that means we also need env::var_os?

@saethlin
Copy link
Member

saethlin commented Jun 9, 2023

I'm done leaving comments. I think this is an amazing start, and I'd love to try it out when the problem with unset CARGO_EXTRA_FLAGS is addressed.

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start indeed, thanks a lot. :) I left a bunch of notes from a first pass. My main concern is verbosity: this is a lot longer than the shell script it is porting. I think this can be reduced be avoiding all the conversion from paths to strings and whatnot -- no such conversion is needed when working with Command directly, I hope we can also make that work here.

Also command.rs could use some splitting up into multiple files: all the cargo-wrapping stuff, the pull/push stuff, and the other stuff (fmt, bench, many-seeds, toolchain)

As usual please don't force-push this branch, but add new commits, so that I can see more easily what changed compared to last time.

which = "4.4"
walkdir = "2.3"
itertools = "0.10"
path_macro = "1.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing newline at end of file

Comment on lines 40 to 44
Bless {
/// Flags that are passed through to `cargo test`.
#[arg(trailing_var_arg = true)]
flags: Vec<String>,
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, what we really want is ./miri test --bless; that just was too annoying to implement with the shell script.

Feel free to just leave a FIXME to do this later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, it's fine - I've thought that --bless would be better than having a separate command for it, but I wanted to stay as close to script as possible. If we're fine with such "breaking" changes, that's fine with me.

},
/// Build miri, set up a sysroot and then run the driver with the given <flags>.
/// (Also respects MIRIFLAGS environment variable.)
RunDep {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also eventually become ./miri run --dep or so.

}
fn get_auto_config(&self, miri_dir: &Path) -> Option<AutoConfig> {
use std::env::VarError;
if let Err(VarError::NotPresent) = std::env::var("MIRI_AUTO_OPS") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use std::env::var_os(...).is_none()

fn get_auto_config(&self, miri_dir: &Path) -> Option<AutoConfig> {
use std::env::VarError;
if let Err(VarError::NotPresent) = std::env::var("MIRI_AUTO_OPS") {
std::env::set_var("MIRI_AUTO_OPS", "42");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be needed any more; the shell script did this because it recursively invoked itself to run the auto-ops, but we do no such nonsense here.

miri-script/src/commands.rs Outdated Show resolved Hide resolved
let toolchain = self.get_toolchain_override();
let command =
format!("cargo {} miri run --manifest-path {current_bench_dir}", toolchain);
cmd("hyperfine", ["-w", "1", "-m", "5", "--shell=none", &command]).run()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this changed in the shell script recently, so the rust version needs updating

Comment on lines 240 to 249
let commit_hash_line = rustc_info
.lines()
.find(|line| line.starts_with("commit-hash: "))
.expect("rustc output does not contain commit-hash line");
commit_hash_line
.split_whitespace()
.nth(1)
.expect("rustc commit-hash line is malformed")
.trim()
.to_owned()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think rustc_version can replace this ad-hoc parser

miri-script/src/commands.rs Outdated Show resolved Hide resolved
miri-script/src/commands.rs Outdated Show resolved Hide resolved
miri-script/src/main.rs Outdated Show resolved Hide resolved
self.many_seeds(&command, *seed_start, *seeds),
Subcommands::ManySeeds { command, seed_start, seeds } => {
self.many_seeds(&command, *seed_start, *seeds)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You keep changing back and forth between having and not having braces here in every other commit. That makes the commit-by-commit review unnecessarily noisy. What is going on? Almost looks like you are running rustfmt without applying Miri's rustfmt.toml sometimes, and sometimes with the toml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I use multiple setups so it might be the case. I am sorry, I'll try to keep it consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever you are using to format your checkout should be able to pull in Miri's rustfmt.toml. Sometimes editors have difficulty with Miri or rustc because they aren't just standard Cargo projects. Can you take a skim over the CONTRIBUTING.md and see if anything there looks helpful? If there isn't, can you maybe open an issue or start a topic on the Zulip so we can figure out what Miri can do to improve this?

Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xshell was a great choice, this looks a lot better. :)

miri Outdated Show resolved Hide resolved
miri-script/src/commands.rs Outdated Show resolved Hide resolved
}

#[derive(Clone, Debug, Default)]
struct Extras {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is not very descriptive. Why is this a separate type anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it to essentially ensure that these variables are always available in env - and not have to deal with environment lookup in code paths that do use it. My stream of thought was "if we have a valid Extras object, sys root and co were obtained successfully".

And yeah, the name sucks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, MiriRunner uses Extras::default in its new function, which sucks even more. Some of the member functions might see an empty sys root/libdir.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a problem that we had in the shell script as well: these variables need to be determined after the auto-toolchain update happened.

I certainly wouldn't want them to be in env, I was expecting them to be fields in MiriRunner. But we don't have the values always available.

libdir is only needed exactly once. I think we can just remove the field, and instead have a fn libdir that computes the libdir.

For sysroot I propose a cache: have a field sysroot: Option<PathBuf> in MiriRunner, and a fn sysroot() that computes the sysroot and caches the result in self.sysroot. Then we just need to make the toolchain command reset that cache and we should be good, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, these are good ideas. And yeah, I agree that neither sysroot nor libdir should be in env.

let config = command.get_auto_config(&miri_dir).filter(|_| run_auto_things);
// CARGO_EXTRA_FLAGS do not have to be a valid UTF-8, but that's what shell_words' expects.
let cargo_extra_flags = std::env::var("CARGO_EXTRA_FLAGS").unwrap_or_default();
let cargo_extra_flags = shell_words::split(&cargo_extra_flags)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check how cargo treats RUSTFLAGS... we should use the exact same logic. I don't think it uses shell_words for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Run this first, so that the toolchain doesn't change after
// other code has run.
let command = Subcommands::Toolchain { flags: vec![] };
Self::exec_inner(&command, false)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this just be self.toolchain? (similar for the other uses of exec_inner)

The control flow right now is rather hard to follow here

Copy link
Contributor Author

@osiewicz osiewicz Jul 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've figured that exec_inner resembles old script a bit better (we start off a fresh MiriRunner instance), but I'd be happy to change it.
E.g: in fn bench we can just call exec_inner without worrying about auto things being executed recursively or environment not being set up proper for install.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control flow in the script was terrible though.^^ We should initialize things once and then in bench we know things are initialized enough for install.

println!("This will pull a copy of the rust-lang/rust history into this Miri checkout, growing it by about 1GB.");
println!("To avoid that, abort now and set the RUSTC_GIT environment variable to an existing rustc checkout. Proceed? [y/N] ");
let mut answer = String::new();
let _ = std::io::stdin().read_line(&mut answer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ignore errors here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, I guess I did it out of laziness (not having to deal with type conversions to satisfy unwrap_or_else et al was nice).
I've moved this snippet to if let now.

) -> Result<()> {
let all_targets: Option<&OsStr> = all_targets.then_some("--all-targets".as_ref());
let sh = Shell::new()?;
cmd!(sh, "cargo {toolchain} check {extra_flags...} --manifest-path {path} {all_targets...} {args...}").run()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cmd!(sh, "cargo {toolchain} check {extra_flags...} --manifest-path {path} {all_targets...} {args...}").run()?;
cmd!(sh, "cargo +{toolchain} check {extra_flags...} --manifest-path {path} {all_targets...} {args...}").run()?;

Missing +?

Comment on lines 503 to 506
Self::build_package(
miri_manifest.as_ref(),
&toolchain,
&self.cargo_extra_flags,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make this a method (self.build_package), you don't have to pass toolchain and cargo_extra_flags.

)?;
self.find_sysroot()?;
if bless {
self.set_env("MIRI_BLESS", "Gesundheit");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better to just use sh.set_env here, and keep this local to this test invocation.


#[cfg(test)]
mod tests {
use shell_words::split;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you testing another crate's function here...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left that in while looking for a way to split the shell words. I'll remove that, I am sorry. :(


/// Prepare and set MIRI_SYSROOT. Respects `MIRI_TEST_TARGET` and takes into account
/// locally built vs. distributed rustc.
fn find_sysroot(&mut self) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be find_miri_sysroot to avoid confusion with extras.sysroot.

Same for build_sysrootbuild_miri_sysroot

@bors
Copy link
Collaborator

bors commented Jul 2, 2023

☔ The latest upstream changes (presumably #2948) made this pull request unmergeable. Please resolve the merge conflicts.

@osiewicz
Copy link
Contributor Author

osiewicz commented Jul 2, 2023

@RalfJung it looks like there's an issue with ./miri toolchain where on Windows setting an override for a directory doesn't affect subsequent call to rustc --print sysroot. Using rustup run rustc --print sysroot does respect the override.
Obviously Miri-script does something incorrectly, as the master version works just fine.
I'm not sure what I'm doing wrong; I've pulled so much hair on this that at this point I might have to see a barber. :/

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2023

it looks like there's an issue with ./miri toolchain where on Windows setting an override for a directory doesn't affect subsequent call to rustc --print sysroot. Using rustup run rustc --print sysroot does respect the override.

I assume the difference is that miri-script itself is run inside a cargo run which runs in rustup. So some rustup env vars are set which take precedence over the toolchain file. We should probably unset them / not forward them to programs we invoke.

@osiewicz
Copy link
Contributor Author

osiewicz commented Jul 7, 2023

Hm, that sounds plausible. What do you say to having something akin to:
cargo build --manifest-path "$(dirname "$0")"/miri-script/Cargo.toml && "$(dirname "$0")"/miri-script/target/debug/miri-script $@
instead of:
cargo run --manifest-path "$(dirname "$0")"/miri-script/Cargo.toml -- $@
The former should not go through rustup, so it won't set its environment variables - but it will still be possible to set them just in case someone wants to do that.

@RalfJung
Copy link
Member

RalfJung commented Jul 7, 2023 via email

@bors
Copy link
Collaborator

bors commented Jul 18, 2023

☔ The latest upstream changes (presumably #2986) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung RalfJung force-pushed the rewrite-miri-script-in-rust-2883 branch from 57cafde to d713c23 Compare July 30, 2023 17:58
@RalfJung RalfJung changed the title [WIP] Rewrite miri script in rust #2883 Rewrite miri script in rust Jul 30, 2023
@RalfJung RalfJung marked this pull request as ready for review July 30, 2023 17:59
@RalfJung
Copy link
Member

I'm going to take over this PR -- at this point I think it is easier if I just implement things the way I prefer, rather than trying to explain to you how I would like this to be changed. :) Your groundwork has been extremely helpful for this, thanks a ton!

@RalfJung
Copy link
Member

Looks like there is a problem (which the old code work around in an incorrect way)... for things like ./miri clippy -- -D warnings, we want all of -- -D warnings to be forwarded to clippy. That is, we don't want the clap parser to intercept this -- at all.

Is that possible?

@osiewicz
Copy link
Contributor Author

Hi,
I am sorry for not carrying this further - I've had less time recently for OS work. :/
As for the "--" forwarding issue, I believe I've annotated clippy and fmt subcommands with allow_leading_hyphen. I recall that there were still issues with it tho.
Another problem with what I've had up to this point was that new fmt was noticeably slower than the old one. I suppose it came down to formatting files sequentially Vs using xargs in a shell script - which could've parallelized different rustfmt instances under the hood.

@RalfJung
Copy link
Member

Another problem with what I've had up to this point was that new fmt was noticeably slower than the old one.

I fixed that by running rustfmt a single time on all the files, instead of having a new rustfmt invocation for each file. (That matches what the old script does.)

As for the "--" forwarding issue, I believe I've annotated clippy and fmt subcommands with allow_leading_hyphen. I recall that there were still issues with it tho.

That's not sufficient -- as the clap docs for trailing_var_arg say, "users still have the option to explicitly escape ambiguous arguments with --". I don't know if clap has a way to to disable this. If not, we won't be able to use clap.

All the CI failures currently are caused by this.

@RalfJung RalfJung force-pushed the rewrite-miri-script-in-rust-2883 branch 2 times, most recently from d9e8f3c to 6748f1b Compare July 30, 2023 18:43
let toolchain: &OsStr = self.active_toolchain.as_ref();
cmd!(
sh,
"cargo +{toolchain} test {extra_flags...} --manifest-path {miri_manifest} -- {flags...}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -- {flags..} here is what got CI to work despite the parser issue, but it breaks other things -- for instance, ./miri test --all-features should pass that flag to cargo (before the --), not to the test runner. (Same for clippy.)

@RalfJung RalfJung force-pushed the rewrite-miri-script-in-rust-2883 branch 9 times, most recently from b08f1f0 to e5cccf3 Compare July 31, 2023 07:26
@RalfJung RalfJung force-pushed the rewrite-miri-script-in-rust-2883 branch from e5cccf3 to e5b358a Compare July 31, 2023 07:54
@RalfJung
Copy link
Member

@bors r+

@bors
Copy link
Collaborator

bors commented Jul 31, 2023

📌 Commit e5b358a has been approved by RalfJung

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Jul 31, 2023

⌛ Testing commit e5b358a with merge 273cdab...

@bors
Copy link
Collaborator

bors commented Jul 31, 2023

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing 273cdab to master...

@bors bors merged commit 273cdab into rust-lang:master Jul 31, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite miri script in Rust
4 participants