Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some issues with absolute paths in dep-info files. #7137

Merged
merged 7 commits into from
Jul 26, 2019

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Jul 15, 2019

There were some issues with how #7030 was handling translating paths in dep-info files. The main consequence is that when coupled with rust-lang/rust#61727 (which has not yet merged), the fingerprint would fail and be considered dirty when it should be fresh.

It was joining target_root which had 3 different values, but stripping only one. This means for different scenarios (like using --target), it was creating incorrect paths in some cases. For example a target having a proc-macro dependency which would be in the host directory.

The solution here is to always use CARGO_TARGET_DIR as the base that all relative paths are checked against. This should handle all host/target differences.

The tests are marked with #[ignore] because 61727 has not yet merged.

This includes a second commit (which I can open as a separate PR if needed) which is an alternate solution to #7034. It adds dep-info tracking for registry dependencies. However, it tries to limit which kinds of paths it tracks. It will not track package-relative paths (like source files). It also adds an mtime cache to significantly reduce the number of stat calls (because every dependency was stating every file in sysroot).

Finally, I've run some tests using this PR with 61727 in rust-lang/rust. I can confirm that a second build is fresh (it doesn't erroneously rebuild). I can also confirm that the problem in rust-lang/rust#59105 has finally been fixed!

My confidence in all this is pretty low, but I've been unable to think of any specific ways to make it fail. If anyone has ideas on how to better test this, let me know. Also, feel free to ask questions since I've been thinking about this a lot for the past few weeks, and there is quite a bit of subtle stuff going on.

@rust-highfive
Copy link

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 15, 2019
@ehuss
Copy link
Contributor Author

ehuss commented Jul 15, 2019

Marking as draft, as it currently doesn't work on Windows. See rust-lang/rust#61727 (comment)

I'm going to propose the following rollout schedule. I think we should back out #7030 on beta, since it has known bugs I feel uncomfortable leaving it in (this is optional). We can land this PR now in Cargo. Then wait for 1.39 (the next release cycle) to land 61727. Beta-cargo (1.37) won't handle the extra paths properly, so when used in rust-lang/rust I think it will cause too many problems to land 61727 now. I'm not sure if @Mark-Simulacrum is OK with letting that PR wait another 5 weeks. The alternative is to backport this PR to beta, but I feel that this PR is very risky, so I don't feel comfortable doing that (I can be convinced otherwise, though!).

@Mark-Simulacrum
Copy link
Member

I'm okay backing out relevant PRs for beta, especially if they have known bugs. In the meantime, I can start working on moving rust-lang/rust's rustbuild over to this new system (by creating a toolchain for at least x86_64-unknown-linux-gnu via try builders, for example).

Ideally, we'd backport while gating under -Z flag or something like that but I think that's presumably not really viable here?

@@ -1547,6 +1550,9 @@ impl DepInfoPathType {
/// The `rustc_cwd` argument is the absolute path to the cwd of the compiler
/// when it was invoked.
///
/// If the `allow_package` argument is false, then package-relative paths are
/// skipped and ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this comment be expanded to include rationale for why we'd set it to false?

It looked a bit odd for a second filtering here vs filtering when we parse the file, but I don't see any reason why we'd do it the other way around

let path_mtime = match paths::mtime(path) {
Ok(mtime) => mtime,
Err(..) => return Some(StaleFile::Missing(path.to_path_buf())),
let path_mtime = match mtime_cache.entry(path.to_path_buf()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most worrisome part here to me, using the mtime cache for all calls to paths::mtime. My main worry is how this interacts with edits during the build itself. Can you confirm that find_stale_file here is basically only called during "startup" of Cargo? I think it's fine to, basically during the preparation phase of a build, assume that everything happens instantaneously.

I think that's the case here already since it's the only phase which should have &mut Context, but would be worth double-checking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also you were mentioning that this deduplicates sysroot dependencies, but that's the only mtime calls that are being reduced, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is all done at the beginning.

It can also deduplicate other paths as well. For example, when doing cargo build --tests then main.rs will show up twice (once as a dep of the bin executable, and once as the bin unittest). There are a variety of other situations (using the same modules from different targets, include shenanigans, etc.). I'm fairly confident they should all be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good to me!

One step we could also do is to store Option<HashMap> and unwrap it here, and then after the build preparation is finished we set it to None to basically internally assert we don't access it again. I don't think that's necessary at this time, though.

} else {
Ok(Some(paths))
}
Ok(Some(paths))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment on how come this change was necessary? Is it because package files were filtered out for registry deps?

I'm trying to remember why this case was here but I'm coming up with a blank...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the paths.is_empty() logic was added in #4788 with no comment as to why. The previous logic looks like what's added in this PR. So long as all tests pass seems reasonable to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I couldn't figure out why this check was added in 4788. The main gist is that when building a registry dependency with current rustc, the list would end up empty (everything gets filtered out), and cargo would treat it as "dirty" when it should be fine. By instead returning an empty list, it treats it correctly.

@alexcrichton
Copy link
Member

I think that ignoring source-relative files for registry deps for stat information is a great compromise, so the idea behind this PR sounds good and feels sound to me FWIW

@ehuss
Copy link
Contributor Author

ehuss commented Jul 15, 2019

Ideally, we'd backport while gating under -Z flag or something like that but I think that's presumably not really viable here?

Oh, I didn't think about that. I'm probably still not comfortable back porting things since this looks like it has a high risk to me regardless. If you want to add a -Z flag to 61727, then I guess that PR could land right away. We could then make it the default in 1.39. It's up to you, whichever is easiest.

@alexcrichton
Copy link
Member

FWIW forgot to mention but I personally agree with the strategy of "revert beta, land this now, and get rustbuild working ASAP via what's necessary but don't backport this PR"

@Mark-Simulacrum
Copy link
Member

Perhaps I misunderstood -- I thought this PR was necessary to get rust-lang/rust#61727 working properly, even with modifications to rustbuild? In that case it seems we need to backport to beta to get rustbuild working. I'd be happy to do the legwork adding unstable flags or similar wherever necessary!

@ehuss
Copy link
Contributor Author

ehuss commented Jul 16, 2019

Perhaps I misunderstood -- I thought this PR was necessary to get rust-lang/rust#61727 working properly, even with modifications to rustbuild? In that case it seems we need to backport to beta to get rustbuild working. I'd be happy to do the legwork adding unstable flags or similar wherever necessary!

It is required, but from my perspective I feel it is too risky to backport this to beta. I would prefer to wait for the next release cycle before turning 61727 on. If you add a -Z flag to 61727, then you can land it now and we can experiment more with testing it (by setting cargo in config.toml to a version from nightly). And then in 1.39 we can remove the -Z flag and make it the default. Or we can just leave 61727 open until 1.39.

It's unfortunate to delay 4 more weeks, but I don't want to chase potential new issues on beta.

@Mark-Simulacrum
Copy link
Member

Ah, sure, that makes sense. I'll work on making that happen -- I think from a rustc perspective a -Z flag should be fairly easy to implement.

@ehuss ehuss marked this pull request as ready for review July 16, 2019 18:24
@ehuss ehuss marked this pull request as ready for review July 16, 2019 18:24
@ehuss
Copy link
Contributor Author

ehuss commented Jul 16, 2019

I added a hack to deal with Windows extended-length paths. I'm open to other approaches, but I think any other approach is going to be more complex.

I haven't tried to confuse it using symlinks, but I don't think that is a new problem.

} else if let Ok(stripped) = file.strip_prefix(target_root) {
let file = if cfg!(windows) && file.starts_with("\\\\?\\") {
// Remove Windows extended-length prefix, since functions like
// strip_prefix won't work if you mix with traditional dos paths.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this isn't quite right I think because if you strip \\?\C:\foo from \\?\C:\foo\bar you'll get bar. If you strip C:\foo from \\?\C:\foo\bar, however, it doesn't work. I won't pretend to know enough about paths to know whether that's a bug or not though.

I think a better stategy may be to have canonicalized and non-canonicalized versions of the various root paths we're considering here? I think this would definitely be a problem on Unix as well if the project dir or something above it uses a symlink, but stripping both the current absolute dir as well as the canonicalized current absolute dir should solve that as well (and handle this \\?\ business transparently)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fair, I'll give it a shot. It will just be a bit more complex, but hopefully not too much. I'll also try to create a test to see if symlink canonicalization actually is a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with a new approach which always compares canonical paths.

It did reveal that it was not handling symlinks properly, so it should work better on all platforms now.

@ehuss
Copy link
Contributor Author

ehuss commented Jul 25, 2019

Once rust-lang/rust#61727 is in nightly, I'll update the tests here.

@bors
Copy link
Contributor

bors commented Jul 25, 2019

☔ The latest upstream changes (presumably #7175) made this pull request unmergeable. Please resolve the merge conflicts.

This adds dep-info tracking for non-path dependencies. This avoids tracking
package-relative paths (like source files) in non-path dependencies, since we
assume they are static. It also adds an mtime cache to improve performance since
when rustc starts tracking sysroot files (in the future), it can cause a large
number of stat calls.
Instead of treating Windows differently, this just always uses canonical paths
on all platforms.  This fixes a problem where symlinks were not treated
correctly on all platforms.

Switching rm_rf to the remove_dir_all crate because deleting symbolic links on
Windows is difficult.
@ehuss
Copy link
Contributor Author

ehuss commented Jul 26, 2019

I added a -Z binary-dep-depinfo flag that gets passed on to rustc, and updated the tests. This should be good for review now.

@alexcrichton
Copy link
Member

@bors: r+

I'm not 100% certain it's ok to rely so heavily on canonicalize, but I think this should be good enough for now and we can tweak as necessary. Thanks @ehuss!

@bors
Copy link
Contributor

bors commented Jul 26, 2019

📌 Commit c6e626b has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 26, 2019
@bors
Copy link
Contributor

bors commented Jul 26, 2019

⌛ Testing commit c6e626b with merge 1dd0215...

bors added a commit that referenced this pull request Jul 26, 2019
Fix some issues with absolute paths in dep-info files.

There were some issues with how #7030 was handling translating paths in dep-info files. The main consequence is that when coupled with rust-lang/rust#61727 (which has not yet merged), the fingerprint would fail and be considered dirty when it should be fresh.

It was joining [`target_root`](https://github.com/rust-lang/cargo/blame/1140c527c4c3b3e2533b9771d67f88509ef7fc16/src/cargo/core/compiler/fingerprint.rs#L1352-L1360) which had 3 different values, but stripping [only one](https://github.com/rust-lang/cargo/blame/1140c527c4c3b3e2533b9771d67f88509ef7fc16/src/cargo/core/compiler/mod.rs#L323). This means for different scenarios (like using `--target`), it was creating incorrect paths in some cases. For example a target having a proc-macro dependency which would be in the host directory.

The solution here is to always use CARGO_TARGET_DIR as the base that all relative paths are checked against. This should handle all host/target differences.

The tests are marked with `#[ignore]` because 61727 has not yet merged.

This includes a second commit (which I can open as a separate PR if needed) which is an alternate solution to #7034. It adds dep-info tracking for registry dependencies. However, it tries to limit which kinds of paths it tracks. It will not track package-relative paths (like source files). It also adds an mtime cache to significantly reduce the number of stat calls (because every dependency was stating every file in sysroot).

Finally, I've run some tests using this PR with 61727 in rust-lang/rust. I can confirm that a second build is fresh (it doesn't erroneously rebuild). I can also confirm that the problem in rust-lang/rust#59105 has *finally* been fixed!

My confidence in all this is pretty low, but I've been unable to think of any specific ways to make it fail. If anyone has ideas on how to better test this, let me know. Also, feel free to ask questions since I've been thinking about this a lot for the past few weeks, and there is quite a bit of subtle stuff going on.
@bors
Copy link
Contributor

bors commented Jul 26, 2019

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 1dd0215 to master...

@bors bors merged commit c6e626b into rust-lang:master Jul 26, 2019
bors added a commit to rust-lang/rust that referenced this pull request Aug 1, 2019
Update cargo, rls

## cargo

12 commits in d0f828419d6ce6be21a90866964f58eb2c07cd56..26092da337b948719549cd5ed3d1051fd847afd7
2019-07-23 21:58:59 +0000 to 2019-07-31 23:24:32 +0000
- tests: Enable features to fix unstabilized `#[bench]` (rust-lang/cargo#7198)
- Fix excluding target dirs from backups on OSX (rust-lang/cargo#7192)
- Handle symlinks to directories (rust-lang/cargo#6817)
- Enable pipelined compilation by default (rust-lang/cargo#7143)
- Refactor resolve `Method` (rust-lang/cargo#7186)
- Update `cargo_compile` module doc. (rust-lang/cargo#7187)
- Clean up TargetInfo (rust-lang/cargo#7185)
- Fix some issues with absolute paths in dep-info files. (rust-lang/cargo#7137)
- Update the `url` crate to 2.0 (rust-lang/cargo#7175)
- Tighten requirements for git2 crates (rust-lang/cargo#7176)
- Fix a deadlocking test with master libgit2 (rust-lang/cargo#7179)
- Fix detection of cyclic dependencies through `[patch]` (rust-lang/cargo#7174)

## rls

1 commits in 70347b5d4dfe78eeb9e6f6db85f773c8d43cd22b..93d9538c6000fcf6c8da763ef4ce7a8d407b7d24
2019-07-30 12:56:38 +0200 to 2019-07-31 21:42:49 +0200
- Update cargo (rust-lang/rls#1529)
bors added a commit that referenced this pull request Aug 6, 2019
Fix remap-path-prefix from failing.

rustc currently remaps the source paths in the dep-info file, causing them to potentially point to a non-existent path. This causes the call to `canonicalize` added in #7137 to fail.  Just ignore the error.

Closes #7211
@ehuss ehuss added this to the 1.38.0 milestone Feb 6, 2022
@ehuss ehuss mentioned this pull request Nov 3, 2022
bors added a commit that referenced this pull request Nov 4, 2022
Remove remove_dir_all

This removes the `remove_dir_all` dependency. It is no longer needed, as there has been significant improvements to std's implementation over the years (particularly recently). This improves the performance of running cargo's testsuite on my machine from 8m to 2m 30s (!!).

This was added in #7137 to deal with some issues with symbolic links. I believe those seem to be resolved now.

Note that std's implementation is still not bullet proof. In particular, I think there are still some cases where a file can be locked by another process which prevents it from being removed. There are some more notes about this at #7042 (comment) (and various other places linked there).

Closes #9585
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants