-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't allocate on SimplifyCfg/Locals/Const on every MIR pass #110477
Don't allocate on SimplifyCfg/Locals/Const on every MIR pass #110477
Conversation
(rustbot has picked a reviewer for you, use r? to override) |
Some changes occurred to MIR optimizations cc @rust-lang/wg-mir-opt |
Do you have any evidence to suggest that these are expensive operations? I don't think this code is really worth the extra complication and possibly introducing new panic edges to the compiler just to avoid some string formatting. |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit ccc7a0c1af6f58b3081024583d3cfdbcfb4f3434 with merge b89ba5d787d15245a9be6ac6f1619c153b23ea97... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b89ba5d787d15245a9be6ac6f1619c153b23ea97): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
Apparently keccak and codegen-cranelift have been particularly noisy recently, so I've been advised to ignore those. As for the other positive perf results, I'll have to take a closer look at them to see if they're legit or just noise as well. (Or maybe I'll queue another perf run in the morning and see if these perf results stick?) |
comment worth noting from a sidebar conversation about (current behavior, as it relates to this PR):
|
Anyways, @miguelraz, I did some thinking. I think the right approach for this would be to make Something like: enum SimplifyCfgPassName {
Initial,
PromoteConsts,
...
}
impl SimplifyCfg {
fn new(e: SimplifyCfgPassName) -> Self {
SimplifyCfg { e }
}
fn name(&self) -> &'static str {
match self.e {
SimplifyCfgPassName::Initial => "SimplifyCfg-initial",
}
} |
The improvement is probably legit. I've seen major regressions in the past resulting from over-calling |
This would also reduce the size from the size of the string ref (pointer and length) to the size of the enum, which will save about 7~15 bytes per instance of this struct that is in memory at any given moment. Not the most important win compared to allocation overhead, but y'know, everything counts in large amounts. |
cachegrind results from libc Debug Full:
lots of noise, but there are some memcpys and core::fmt, so this looks like a legit improvement. Awesome! |
I quickly profiled a few other allocation sites inside |
Some of the other wins look related to formatting so probably legit. There shouldn't be many instances of these structs in-flight at the same time, so maybe we wouldn't really see size reduction benefits (nor exhaustiveness for such debugging info), and can e.g. take the name as |
Even simpler: we can make |
@cjgillot yes, I just realized that and push that very change, thanks for the tip! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@miguelraz can you squash this into one commit? Other than that, this PR looks good to go.
36b3f70
to
fc27ae1
Compare
@bors r=compiler-errors |
pub fn new(label: &str) -> Self { | ||
SimplifyConstCondition { label: format!("SimplifyConstCondition-{}", label) } | ||
} | ||
pub enum SimplifyConstConditionPassName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) no need to block the PR on this, but if you touch this code again in the future could you rename this to just SimplifyConstCondition
? That would make it consistent with all the other ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, finally got around to this fix.
#110657
☀️ Test successful - checks-actions |
Finished benchmarking commit (9e7f72c): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
|
For the record, this PR was a revived attempt of #108026. |
…ctor, r=compiler-errors nit: consistent naming for SimplifyConstCondition Fixing a small naming inconsistency that `@JakobDegen` brought up in rust-lang#110477 (comment). Please signal for rollup.
Hey! 👋🏾 This is a first PR attempt to see if I could speed up some rustc internals.
Thought process:
in compiler/src/rustc_mir_transform/simplify.rs fires multiple times per MIR analysis. This means that a likely string allocation is happening in each of these runs, which may add up, as they are not being lazily allocated or cached in between the different passes.
...yes, I know that adding a global static array is probably not the future-proof solution, but I wanted to lob this now as a proof of concept to see if it's worth shaving off a few cycles and then making more robust.