-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up --all on large workspaces #4722
Conversation
Thanks for the PR, but I think this is going to cause breakage in the required behavior (discussed in detail in #4247, summarized in #4247 (comment)). There's been some very recent updates to the cargo metadata output that will allow us to maintain the format-the-entire-world behavior that's required of |
I believe this PR preserves existing behavior (but I didn’t look too closely here). |
I'm on a phone rk now so haven't taken too close a look yet either. However, even if it doesn't cause any missed implicit members, I'm not sure it will do much in the way of addressing the real problem. When running with If there's somehow scenarios where we end up running The major benefit here is going to come from overhauling all of this to take advantage of the updated cargo metadata output so that we can always run with |
I believe this is wrong. Metadata output is exactly the same for every workspace member (only
|
I appreciate that point of view, as it's the initial reaction most folks (including myself 😄) have. However, there's a really critical albeit subtle different between the outputs that makes all the difference. When running with The recent updates in cargo have finally enabled us to work past this issue. |
The The problem is that, for a workspace with N members, If I might be wrong here, and there might be cases where calling |
I think we may be talking past each other a bit, though that's my fault for not articulating my thinking very well. In short, I'm really just trying to think through whether there's any edge cases where this could be problematic, and if not, whether such a tactical improvement on the current flow would be worthwhile given the strategic plan (driven by To expand on some of my earlier comments, if there's cases where we can improve the current flow without missing a package in some edge case then that's always a plus. My sense from the changes/discussion is that there seems to be such a possible improvement in some explicit workspace cases. I do have a vague recollection of some odd edge cases that were reported a couple years ago, but happy to take a deeper look to rule out any such issues.
If we can avoid the pair of E.g. if Still want to get to a conclusion on this regardless, but the I'd already started working on the --no-deps related changes, but will also take a look at ruling out any potential edge case side effects from this in parallel. If you're willing and have bandwidth to check what kind of performance improvement your changes provide that'd be most appreciated! If it looks like it helps then I'm happy to pause the --no-deps work for a bit to try to get this incorporated and released 👍 |
Here are the timings before this change:
Here are the timings after this change:
I also tried to estimate the speedup we'd get by getting rid of the run without So my current belief is that this PR substantially improves pref due to big-O change, while the |
🎉 That looks great thanks for sharing, I will be ecstatic to be wrong about this! By any chance did you test a subsequent local run without changes and/or a run with the changes but with a clean registry cache? Want to completely rule out any potential inflation on the pre-changes run due to it being the one that has that first registry hit
You're far more familiar with the innards of cargo than I so please let me know if I'm wrong on this one, but my understanding is that the
I'm still skeptical of that latter part of the statement. We've seen/heard repeatedly about the network/registry hits with |
If people observe network activity with
Trying that right now. As expected, something somewhere is running |
Ah, it's Lines 678 to 682 in 2e1a982
|
TBH, I feel that 90% percent of people who type |
Apologies for the long delay on this, I let the testing stack up and then got pulled off in other directions. Thanks again for the suggested changes, and while we will indeed incorporate your suggestion, I'm going to close this in favor of #4997. The change as originally proposed probably would have been fine. However, it's an area that's been a recurring source of subtle bugs and one where we have really poor test coverage which always gives us pause. Plus as noted earlier there's other problem scenarios that would've been unaddressed so by going with the cargo updates instead it'll both definitely allow us to pull in your optimization and it also solves those other problems. As a reference, here's what I got when running with a clean cargo cache using just your proposed change (I manually made the misformatting change on my local copy of the nearcore repo to provide a visual output, some file paths removed):
Admittedly I'm using some rather pedestrian hardware and am on a fairly spotty wifi, but that really just underscores the point. Neither cargo-fmt nor rustfmt truly need an internet connection nor the associated index hit, but cargo-fmt had to do it in order to support a handful of frustrating edge cases. That's caused some real challenges for users that have been reported, both in network-restricted scenarios, and more broadly in ephemeral/CI type environments that often dealt with an empty cache. Additionally, and as discussed above/in prior issues, even once everything is cached the current/unmodified version of cargo fmt still takes an abysmal amount of time:
Fortunately, this is improved dramatically just from dropping the
This continues to hold true even with a completely clean cache:
Those massive improvements in both cached and uncached environments should make for very real, and noticeable experience improvements for users. Furthermore, with the cargo metadata updates that guarantee every node, package or dependency, will have the requisite information, we can pull in your suggestion to speed things up even more:
Which now has
Yup, was one of those unfortunate but necessary things that was needed before in order to ensure the path information for dependency nodes would be available.
I know what you mean. I've been making the case that we should try to replace/soft-deprecate it to something else, but always a little tricky dealing with breaking changes. Fortunately, there will no longer be any real penalty for folks when using it unnecessarily which should help.
Suspect it's probably moot at this point, but I wasn't really sure what you meant here. Maybe it's just a terminology thing, but it's something we hear from users here, and verbiage I've seen used around other tools (I assumed the etymology had some ties back to cargo's own "implicit relations" wording in workspace contexts). Ultimately we're just talking about folks that have multi-crate projects using relative path dependency references without codifying a Folks with those types of projects also have the (IMO) reasonable desire to be able to format everything in their project, and that's something cargo-fmt has always supported and one of the things I was alluding to earlier. Thanks again for the discussion, the fix will be in the next release! |
I've noticed that
cargo fmt --all
takes an unresonable 9 seconds on https://github.com/near/nearcore. The culprit seems to be that we callcargo metadata
on every package in the workspace, but their metas are the same!