Speed up --all on large workspaces #4722

matklad · 2021-02-25T16:26:40Z

I've noticed that cargo fmt --all takes an unresonable 9 seconds on https://github.com/near/nearcore. The culprit seems to be that we call cargo metadata on every package in the workspace, but their metas are the same!

calebcartwright · 2021-02-25T17:40:37Z

Thanks for the PR, but I think this is going to cause breakage in the required behavior (discussed in detail in #4247, summarized in #4247 (comment)). There's been some very recent updates to the cargo metadata output that will allow us to maintain the format-the-entire-world behavior that's required of --all, but that's not yet been incorporated so the usage of --all still incurs the associated perf hits

matklad · 2021-02-25T18:28:38Z

I believe this PR preserves existing behavior (but I didn’t look too closely here).

calebcartwright · 2021-02-25T19:00:54Z

I believe this PR preserves existing behavior (but I didn’t look too closely here).

I'm on a phone rk now so haven't taken too close a look yet either. However, even if it doesn't cause any missed implicit members, I'm not sure it will do much in the way of addressing the real problem. When running with --all it's necessary to run cargo metadata without the --no-deps option for every encountered member, and every local dep. That requires the hit on the registry index which is the biggest cause on performance by a long, long shot.

If there's somehow scenarios where we end up running cargo metadata (sans --no-deps) more than once per package then sure happy to skip the extra call. I just wouldn't expect that to have much of a tangible impact on improving performance given the extreme hit of the required first call.

The major benefit here is going to come from overhauling all of this to take advantage of the updated cargo metadata output so that we can always run with --no-deps

matklad · 2021-02-25T19:28:35Z

it's necessary to run cargo metadata without the --no-deps option for every encountered member

I believe this is wrong. Metadata output is exactly the same for every workspace member (only root field is different, but that's effectively deprecated).

22:26:15|~/projects/nearcore|fmt-test⚡?
λ cargo metadata | python3 -m json.tool > a.json
warning: please specify `--format-version` flag explicitly to avoid compatibility problems

22:26:43|~/projects/nearcore|fmt-test⚡?
λ pushd runtime/runtime && cargo metadata | python3 -m json.tool > ../../b.json && popd
warning: please specify `--format-version` flag explicitly to avoid compatibility problems

22:26:54|~/projects/nearcore|fmt-test⚡?
λ diff a.json b.json
136208c136208
<         "root": "nearcore 0.1.0 (path+file:///home/matklad/projects/nearcore)"
---
>         "root": "node-runtime 3.0.0 (path+file:///home/matklad/projects/nearcore/runtime/runtime)"

calebcartwright · 2021-02-25T19:37:55Z

I believe this is wrong. Metadata output is exactly the same for every workspace member

I appreciate that point of view, as it's the initial reaction most folks (including myself 😄) have. However, there's a really critical albeit subtle different between the outputs that makes all the difference.

When running with --no-deps the dependency nodes for the packages do not have any information that can be used to locate local dependencies (rust-lang/cargo#7483). In order to support formatting implicit workspace members, rustfmt has to be able to locate those implicit members, and it's not possible to do this consistently from the metadata output produced with --no-deps because the source field is nulled out. We previously used to make assumptions about the file system based on the dependency names, but this was also proven to not be viable since there's no guarantee they will match and we had several folks report bugs seeing that in reality.

The recent updates in cargo have finally enabled us to work past this issue.

matklad · 2021-02-25T20:13:36Z

The --no-deps vs no --no-deps is orthogonal to this PR.

The problem is that, for a workspace with N members, cargo fmt does 2N calls to metadata, but all but the two of calls return exactly the same json. See https://github.com/matklad/wsmeta for a demonstration. After this PR, only 2 calls will be made.

If --no-deps issue is fixed (without this PR), then the number of calls is reduced from 2N to N which, while admirable (especially due to slashing costly network IO), doesn't change the big O.

I might be wrong here, and there might be cases where calling cargo metadata [--no-deps] on two different members of the same workspace yields different results, but so far I wasn't able to observe this, and my understanding is that this shouldn't be the case.

calebcartwright · 2021-02-25T22:33:53Z

The --no-deps vs no --no-deps is orthogonal to this PR.

I think we may be talking past each other a bit, though that's my fault for not articulating my thinking very well. In short, I'm really just trying to think through whether there's any edge cases where this could be problematic, and if not, whether such a tactical improvement on the current flow would be worthwhile given the strategic plan (driven by --no-deps stuff) which I anticipate will involve a rewrite or at least a heavy refactoring of the same code.

To expand on some of my earlier comments, if there's cases where we can improve the current flow without missing a package in some edge case then that's always a plus. My sense from the changes/discussion is that there seems to be such a possible improvement in some explicit workspace cases. I do have a vague recollection of some odd edge cases that were reported a couple years ago, but happy to take a deeper look to rule out any such issues.

If --no-deps issue is fixed (without this PR), then the number of calls is reduced from 2N to N which, while admirable (especially due to slashing costly network IO), doesn't change the big O.

If we can avoid the pair of cargo metadata/cargo metadata --no-deps calls then that's of course good thing. My sense though (admittedly haven't tried to time it yet) is that the initial upfront cargo metadata call that's still going to be in place will continue to be the biggest drag on performance.

E.g. if cargo fmt --all still takes 7-8 seconds on the nearcore project even after the change then I don't think we've bought ourselves all that much from an end user experience, even if things have been improved and optimized within.

Still want to get to a conclusion on this regardless, but the cargo fmt --all perf is a big issue that's problematic for a lot of folks and I'm trying to first focus on the thing that will give the biggest improvement.

I'd already started working on the --no-deps related changes, but will also take a look at ruling out any potential edge case side effects from this in parallel. If you're willing and have bandwidth to check what kind of performance improvement your changes provide that'd be most appreciated! If it looks like it helps then I'm happy to pause the --no-deps work for a bit to try to get this incorporated and released 👍

matklad · 2021-02-26T08:15:52Z

Here are the timings before this change:

11:10:25|~/projects/nearcore|fmt-test⚡?
λ t ../rustfmt/target/release/cargo-fmt fmt --all
real 12.12s
user 10.57s
sys  1.54s
rss  258636k

11:10:46|~/projects/nearcore|fmt-test⚡?
λ t ../rustfmt/target/release/cargo-fmt fmt 
real 0.96s
user 0.94s
sys  0.02s
rss  19184k

Here are the timings after this change:

11:10:50|~/projects/nearcore|fmt-test⚡?
λ t ../rustfmt/target/release/cargo-fmt fmt --all
real 1.24s
user 1.18s
sys  0.05s
rss  61332k

11:11:06|~/projects/nearcore|fmt-test⚡?
λ t ../rustfmt/target/release/cargo-fmt fmt 
real 0.96s
user 0.94s
sys  0.01s
rss  19260k

I also tried to estimate the speedup we'd get by getting rid of the run without --no-deps, but, while doing so, I've noticed that we already pass --offline flag.

So my current belief is that this PR substantially improves pref due to big-O change, while the --no-deps fix will bring only marginal benefits.

calebcartwright · 2021-02-26T14:11:50Z

Here are the timings before this change:

🎉 That looks great thanks for sharing, I will be ecstatic to be wrong about this!

By any chance did you test a subsequent local run without changes and/or a run with the changes but with a clean registry cache? Want to completely rule out any potential inflation on the pre-changes run due to it being the one that has that first registry hit

I also tried to estimate the speedup we'd get by getting rid of the run without --no-deps, but, while doing so, I've noticed that we already pass --offline flag.

You're far more familiar with the innards of cargo than I so please let me know if I'm wrong on this one, but my understanding is that the --offline flag is more of a best effort/most-of-the-time and that metadata was one of those cases where --offline ended up being moot. Based on past discussions my impression was that it wasn't possible for cargo to avoid the net with cargo metadata so that even with --offline the registry index hit is still there.

So my current belief is that this PR substantially improves pref due to big-O change, while the --no-deps fix will bring only marginal benefits.

I'm still skeptical of that latter part of the statement. We've seen/heard repeatedly about the network/registry hits with --all and I think that's still going to be there, particularly in cases like CI environments. Also while the changes here would be beneficial for explicit workspaces, it won't change the story for implicit workspaces that will still suffer greatly if the registry hit is as problematic as I still expect it to be.

matklad · 2021-02-26T14:42:20Z

but my understanding is that the --offline flag is more of a best effort/most-of-the-time and that metadata was one of those cases where --offline ended up being moot.

--offline happened after I had stopped being actively involved in cargo development, but I am 90% sure that cargo itself will never access network if --offline is passed. --offline is best effort in terms of dependency resolution, not in terms of accessing the network. That is, with --offline cargo may sometime error, where it'd succeeded otherwise. For example, if I trash ~/.cargo/registry/index I get the following result:

17:15:21|~/tmp/h|HEAD⚡?
λ cargo metadata --offline 
warning: please specify `--format-version` flag explicitly to avoid compatibility problems
error: no matching package named `regex` found
location searched: registry `https://github.com/rust-lang/crates.io-index`
required by package `h v0.1.0 (/home/matklad/tmp/h)`
As a reminder, you're using offline mode (--offline) which can sometimes cause surprising resolution failures, if this error is too confusing you may wish to retry without the offline flag.

If people observe network activity with --offline, this most likely means one of the following:

someone somewhere is actually running cargo without --offline flag
there's a bug in cargo

By any chance did you test a subsequent local run without changes and/or a run with the changes but with a clean registry cache?

Trying that right now. As expected, something somewhere is running cargo metadata without --offline flag :D

matklad · 2021-02-26T14:47:13Z

As expected, something somewhere is running cargo metadata without --offline flag :D

Ah, it's cargo fmt which just re-tries without --offline if --offline fails:

rustfmt/src/cargo-fmt/main.rs

Lines 678 to 682 in 2e1a982

    
           cmd.other_options(vec![]); 
        
           match cmd.exec() { 
        
               Ok(metadata) => Ok(metadata), 
        
               Err(error) => Err(io::Error::new(io::ErrorKind::Other, error.to_string())), 
        
           }

matklad · 2021-02-26T15:01:01Z

Also while the changes here would be beneficial for explicit workspaces, it won't change the story for implicit workspaces that will still suffer greatly if the registry hit is as problematic as I still expect it to be.

TBH, I feel that 90% percent of people who type cargo fmt --all (myself included) expect it to behave like cargo's --workspace. "implicit workspace" isn't really a thing anywhere outisde of rustfmt.

calebcartwright · 2021-09-24T02:12:40Z

Apologies for the long delay on this, I let the testing stack up and then got pulled off in other directions.

Thanks again for the suggested changes, and while we will indeed incorporate your suggestion, I'm going to close this in favor of #4997.

The change as originally proposed probably would have been fine. However, it's an area that's been a recurring source of subtle bugs and one where we have really poor test coverage which always gives us pause. Plus as noted earlier there's other problem scenarios that would've been unaddressed so by going with the cargo updates instead it'll both definitely allow us to pull in your optimization and it also solves those other problems.

As a reference, here's what I got when running with a clean cargo cache using just your proposed change (I manually made the misformatting change on my local copy of the nearcore repo to provide a visual output, some file paths removed):

..../rustfmt-sandbox/nearcore$ time ~/..../rustfmt/target/release/cargo-fmt --all --check
Diff in /..../rustfmt-sandbox/nearcore/core/account-id/src/borsh.rs at line 4:
 
 use borsh::{BorshDeserialize, BorshSerialize};
 
-    impl BorshSerialize for AccountId {
+impl BorshSerialize for AccountId {
     fn serialize<W: Write>(&self, writer: &mut W) -> Result<(), Error> {
         self.0.serialize(writer)
     }

real	2m32.073s
user	0m30.198s
sys	0m2.615s

Admittedly I'm using some rather pedestrian hardware and am on a fairly spotty wifi, but that really just underscores the point. Neither cargo-fmt nor rustfmt truly need an internet connection nor the associated index hit, but cargo-fmt had to do it in order to support a handful of frustrating edge cases. That's caused some real challenges for users that have been reported, both in network-restricted scenarios, and more broadly in ephemeral/CI type environments that often dealt with an empty cache.

Additionally, and as discussed above/in prior issues, even once everything is cached the current/unmodified version of cargo fmt still takes an abysmal amount of time:

..../rustfmt-sandbox/nearcore$ time ~/..../rustfmt/target/release/cargo-fmt --all --check
Diff in /..../rustfmt-sandbox/nearcore/core/account-id/src/borsh.rs at line 4:
 
 use borsh::{BorshDeserialize, BorshSerialize};
 
-    impl BorshSerialize for AccountId {
+impl BorshSerialize for AccountId {
     fn serialize<W: Write>(&self, writer: &mut W) -> Result<(), Error> {
         self.0.serialize(writer)
     }

real	0m32.709s
user	0m27.853s
sys	0m4.516s

Fortunately, this is improved dramatically just from dropping the cargo metadata calls that had to be made without the --no-deps flags (which doesn't even include your optimization yet):

..../rustfmt-sandbox/nearcore$ time ..../rustfmt/target/release/cargo-fmt --all --check
Diff in /..../rustfmt-sandbox/nearcore/core/account-id/src/borsh.rs at line 4:
 
 use borsh::{BorshDeserialize, BorshSerialize};
 
-    impl BorshSerialize for AccountId {
+impl BorshSerialize for AccountId {
     fn serialize<W: Write>(&self, writer: &mut W) -> Result<(), Error> {
         self.0.serialize(writer)
     }

real	0m6.090s
user	0m5.606s
sys	0m0.491s

This continues to hold true even with a completely clean cache:

..../rustfmt-sandbox/nearcore$ time ..../rustfmt/target/release/cargo-fmt --all --check
Diff in /..../rustfmt-sandbox/nearcore/core/account-id/src/borsh.rs at line 4:
 
 use borsh::{BorshDeserialize, BorshSerialize};
 
-    impl BorshSerialize for AccountId {
+impl BorshSerialize for AccountId {
     fn serialize<W: Write>(&self, writer: &mut W) -> Result<(), Error> {
         self.0.serialize(writer)
     }

real	0m6.074s
user	0m5.541s
sys	0m0.535s

Those massive improvements in both cached and uncached environments should make for very real, and noticeable experience improvements for users. Furthermore, with the cargo metadata updates that guarantee every node, package or dependency, will have the requisite information, we can pull in your suggestion to speed things up even more:

..../rustfmt-sandbox/nearcore$ time ..../rustfmt/target/release/cargo-fmt --all --check
Diff in /..../rustfmt-sandbox/nearcore/core/account-id/src/borsh.rs at line 4:
 
 use borsh::{BorshDeserialize, BorshSerialize};
 
-    impl BorshSerialize for AccountId {
+impl BorshSerialize for AccountId {
     fn serialize<W: Write>(&self, writer: &mut W) -> Result<(), Error> {
         self.0.serialize(writer)
     }

real	0m1.985s
user	0m1.924s
sys	0m0.061s

Which now has cargo fmt --all running even with cargo fmt, regardless of the state of the cache.

Ah, it's cargo fmt which just re-tries without --offline if --offline fails:

Yup, was one of those unfortunate but necessary things that was needed before in order to ensure the path information for dependency nodes would be available.

TBH, I feel that 90% percent of people who type cargo fmt --all (myself included) expect it to behave like cargo's --workspace.

I know what you mean. I've been making the case that we should try to replace/soft-deprecate it to something else, but always a little tricky dealing with breaking changes. Fortunately, there will no longer be any real penalty for folks when using it unnecessarily which should help.

"implicit workspace" isn't really a thing anywhere outisde of rustfmt.

Suspect it's probably moot at this point, but I wasn't really sure what you meant here.

Maybe it's just a terminology thing, but it's something we hear from users here, and verbiage I've seen used around other tools (I assumed the etymology had some ties back to cargo's own "implicit relations" wording in workspace contexts). Ultimately we're just talking about folks that have multi-crate projects using relative path dependency references without codifying a workspace.members list. That's not uncommon in my experience, and certainly not a "rustfmt thing" (actually believe both clippy and RLS have this structure in their respective codebases).

Folks with those types of projects also have the (IMO) reasonable desire to be able to format everything in their project, and that's something cargo-fmt has always supported and one of the things I was alluding to earlier.

Thanks again for the discussion, the fix will be in the next release!

Speed up --all on large workspaces

d52eef8

jonhoo mentioned this pull request Mar 19, 2021

cargo fmt --all is slow in a workspace project #4247

Closed

calebcartwright changed the base branch from master to rustfmt-2.0.0-rc.2 April 30, 2021 00:25

calebcartwright closed this Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up --all on large workspaces #4722

Speed up --all on large workspaces #4722

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 •

edited

Loading

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 •

edited

Loading

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 •

edited

Loading

matklad commented Feb 26, 2021 •

edited

Loading

calebcartwright commented Feb 26, 2021

matklad commented Feb 26, 2021

matklad commented Feb 26, 2021

matklad commented Feb 26, 2021

calebcartwright commented Sep 24, 2021

Speed up --all on large workspaces #4722

Speed up --all on large workspaces #4722

Conversation

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 • edited Loading

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 • edited Loading

matklad commented Feb 25, 2021

calebcartwright commented Feb 25, 2021 • edited Loading

matklad commented Feb 26, 2021 • edited Loading

calebcartwright commented Feb 26, 2021

matklad commented Feb 26, 2021

matklad commented Feb 26, 2021

matklad commented Feb 26, 2021

calebcartwright commented Sep 24, 2021

calebcartwright commented Feb 25, 2021 •

edited

Loading

calebcartwright commented Feb 25, 2021 •

edited

Loading

calebcartwright commented Feb 25, 2021 •

edited

Loading

matklad commented Feb 26, 2021 •

edited

Loading