-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lint perf regressions #105485
Fix lint perf regressions #105485
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 33cc7df30e6e99df87ca5e82cd8877577717c872 with merge b6d121e1232514d0c98d173e0ab05c40684601ab... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b6d121e1232514d0c98d173e0ab05c40684601ab): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. |
struct RuntimeCombinedEarlyLintPass<'a> { | ||
passes: &'a mut [EarlyLintPassObject], | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instezad of having separate branches for the builtin-only vs builtin+others, can something like this work?
struct RuntimeCombinedEarlyLintPass<'a> { | |
passes: &'a mut [EarlyLintPassObject], | |
} | |
struct RuntimeCombinedEarlyLintPass<'a, T: EarlyLintPass> { | |
builtin: T, | |
passes: &'a mut [EarlyLintPassObject], | |
} |
This would require changing the macros, but we may have both the static dispatch for builtin lints and limit the complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't help. Under the original structure, we had builtin_lints
and passes
, but passes
is always for empty normal compilation. (It's non-empty when bootstrapping, because rustc has some rustc-specific lints, and it's also non-empty for clippy.)
I then combined builtin_lints
into passes
. This meant that even in the normal builtins-only cases, the code had to iterate over the passes
slice for every check_*
call. This is what caused the slowdown. It took me a while to work out because the iterator methods are inlined so it wasn't that obvious.
With this new design, which is very similar to the original design, if passes
is empty then we avoid all iteration over the slice. This gives the speed wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudocode may make it clearer. Simplifying greatly, it's basically the difference between this:
for node in ast {
builtin_lints.check_node(node);
}
and this:
let passes = vec![builtin_lints]
for node in ast {
for pass in passes {
pass.check_node(node);
}
}
})* | ||
} | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an explanation why this commit has an effect on performance? Is this a matter of code size vs indirection vs inlining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above: the last commit is the one that regains the performance, because it separates the builtin_lints
handling from the passes
handling, avoiding all slice iteration when passes
is empty. The second last commit is preparatory work.
(And I forgot to mention that the first five commits are repeated from #105416, sorry.)
if passes.is_empty() { | ||
late_lint_crate_inner(tcx, context, builtin_lints); | ||
} else { | ||
passes.push(Box::new(builtin_lints)); | ||
let pass = RuntimeCombinedLateLintPass { passes: &mut passes[..] }; | ||
late_lint_crate_inner(tcx, context, pass); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the important bit: there are two different calls to late_lint_crate_inner
. The builtins-only one is faster than the passes
one, even if passes
has just one pass, because it doesn't involve slice iteration for each check_foo
call.
Another thing worth mentioning: this new structure is better than the old structure when passes
is non-empty. The original structure involved two AST traversals, roughly:
late_lint_crate_inner(builtin_lints);
if !passes.is_empty() {
late_lint_crate_inner(passes);
}
The new structure always involves one AST traversal:
if !passes.is_empty() {
late_lint_crate_inner(builtin_lints);
} else {
late_lint_crate_inner(bulitin_lints + passes);
}
So this should make bootstrapping and clippy slightly faster, though the difference may not be significant in practice.
Thank you. r=me once #105416 lands. |
This matches the name used in `late.rs`.
I removed these in rust-lang#105291, and subsequently learned they are necessary for performance. This commit reinstates them with the new and more descriptive names `RuntimeCombined{Early,Late}LintPass`, similar to the existing passes like `BuiltinCombinedEarlyLintPass`. It also adds some comments, particularly emphasising how we have ways to combine passes at both compile-time and runtime. And it moves some comments around.
This commit partly undoes rust-lang#104863, which combined the builtin lints pass with other lints. This caused a slowdown, because often there are no other lints, and it's faster to do a pass with a single lint directly than it is to do a combined pass with a `passes` vector containing a single lint.
33cc7df
to
4ff5a36
Compare
☀️ Test successful - checks-actions |
Finished benchmarking commit (b397bc0): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
#104863 caused small but widespread regressions in lint performance. I tried to improve things in #105291 and #105416 with minimal success, before fully understanding what caused the regression. This PR effectively reverts all of #105291 and part of #104863 to fix the perf regression.
r? @cjgillot