-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hint optimizer about reserved capacity #116568
Conversation
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
128562c
to
6767186
Compare
This comment has been minimized.
This comment has been minimized.
6767186
to
d4c78ff
Compare
I think this is what core::intrinsics::assume is for, which should probably be preferred. Seems to generate the same assembly too. |
d4c78ff
to
9a55d7a
Compare
Let's give this a whirl. |
This comment has been minimized.
This comment has been minimized.
Hint optimizer about reserved capacity The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. Because of that, code such as: ```rust vec.try_reserve(123)?; vec.extend_from_slice(&s[..123]); ``` Tries to reserve the capacity **twice**. This is unnecessary code bloat. https://rust.godbolt.org/z/YWY16Ezej Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (5137d8a): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 626.006s -> 627.675s (0.27%) |
slightly concerned that at larger scales like rustc, we're somehow seeing even larger artifacts, despite the minimal example's well-demonstrated changes. with no concomitant perf gains, either. I think we should have a codegen test that unquestionably demonstrates the improvement here before we commit this. and maybe a more comprehensive change that makes the results slightly less middling at the larger scale. |
Unfortunately, the hint in I've even tried using
The unexpected speed regression happens also in #114370 |
0191b93
to
471ec70
Compare
This comment has been minimized.
This comment has been minimized.
048c423
to
aa25a20
Compare
This comment has been minimized.
This comment has been minimized.
aa25a20
to
b90a3af
Compare
I think the regressions are expected here:
So I think paths forward:
|
This does look more promising. The binary size remarks earlier were more of a "hrm, apparent cost without benefit" kind of remark. I think we'd still want a codegen test either way, and then if you feel it would be worth testing the hint on |
36d4f56
to
c03a3af
Compare
Hint optimizer about try-reserved capacity This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570
…ngjubilee Hint optimizer about try-reserved capacity This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570
☔ The latest upstream changes (presumably #117503) made this pull request unmergeable. Please resolve the merge conflicts. |
c03a3af
to
3e1cf5b
Compare
I wonder if that's just noise or if something changed in the meantime or the smaller change is genuinely that much better on its own. Let's see what we can see. @bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Hint optimizer about reserved capacity The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. (rust-lang#82801) Because of that, code such as: ```rust vec.try_reserve(123)?; vec.extend_from_slice(&s[..123]); ``` Tries to reserve the capacity **twice**. This is unnecessary code bloat. https://rust.godbolt.org/z/YWY16Ezej Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (361898b): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 636.185s -> 634.65s (-0.24%) |
It looks like a mixed bag again. With Maybe eventually debug info/assume in llvm or whatever is costing here will get optimized and this can be revisited. |
Yeah, that's a much less beneficial diff, whereas the try-reserve-only version was much better on its own (even if it was mostly noise, it had no regressions), so this seems like the right balance. |
This is likely mostly noise, look at PR that changed only a comment: Similar things has happened to other PRs as well. |
@mati865 It would be nice to have a better idea somehow of how noisy a statistical output is so that we can make better-informed decisions, I feel the current metrics don't adequately capture this thing. I feel like the decisions here were correct anyways, though, because it seems to be that this PR was consistently mixed and the try-reserve version was consistently mildly-positive-or-not-bad. |
It's not noisy typically but lately something odd happened, take a look at this graph:
I do agree with that, previous runs had no significant noise. One more PR with similar noise: #116412 (comment) EDIT: Created Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Recent.20noise.20spikes |
The fact that
Vec
had capacity increased is not known to the optimizer, because functions likedo_reserve_and_handle
are not inlined. (#82801) Because of that, code such as:Tries to reserve the capacity twice. This is unnecessary code bloat.
https://rust.godbolt.org/z/YWY16Ezej
Adding a hint after reserve optimizes out the next check of
self.needs_to_grow(len, additional)
.