-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make RawVec::grow
mostly non-generic.
#72013
Make RawVec::grow
mostly non-generic.
#72013
Conversation
Here is some sample
The current patch gets rid of most of that. My local perf results are mostly good, with instruction counts reductions of up to 9%, but a few small regressions. I'm not quite sure where the regressions are coming from, I will investigate that more on Monday. The code is in draft form, and needs cleaning up before being properly reviewed. @Amanieu may be interested. |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge d4d11c4e38c5b4fe42c2d2c10124bc45ba1fbcc8... |
cc @davidtwco @nikomatsakis -- this seems relevant for the polymorphization efforts, shows at least one potential big win |
Yes, it does! |
💥 Test timed out |
@bors try |
⌛ Trying commit 6d2926b6381d178d7d59b6112949c3c6a2e96568 with merge cee9586dc574582deea517fa2d0eaeeb882167e3... |
☀️ Try build successful - checks-actions, checks-azure |
Queued cee9586dc574582deea517fa2d0eaeeb882167e3 with parent 7b80539, future comparison URL. |
Finished benchmarking try commit cee9586dc574582deea517fa2d0eaeeb882167e3, comparison URL. |
The perf results are all over the place. They're easier to understand if you focus on two subsets.
Hopefully I can fix the slowdowns without too much trouble. I will investigate that tomorrow. |
I think the slowdowns are caused by worse code being generated in some cases due to the I'm taking a slightly different tack now, trying to keep those |
BTW, for the attached patches, I've seen reductions in the number of lines of LLVM IR generate as high as 15% (for |
6d2926b
to
77aa42c
Compare
I've reworked the code significantly, giving wins that are slightly smaller than before, but avoiding the vast majority of the losses. There is scope for pushing harder on moving stuff out of |
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 77aa42ca0e3b632fee16bcd3fe23ab33ce5c066b with merge 78ecf2ce2428bc1c359a284c6dc8bc33246879ac... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 78ecf2ce2428bc1c359a284c6dc8bc33246879ac with parent aeb4738, future comparison URL. |
Perf results are looking pretty good. Debug builds have some wins of up to 5.7%. Opt builds have a few wins, up to 1.7%. Check builds mostly are very slightly regressed, typically by 0.2%. I will fiddle with this some more today, see if I can make it any better. |
It's unused.
📌 Commit 68b7503 has been approved by |
@bors rollup=never Because it affects perf. |
@bors p=1 |
☀️ Test successful - checks-actions, checks-azure |
Currently, if you repeatedly push to an empty vector, the capacity growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the relevant code (the "amortized" growth strategy) to skip 1 and 2 in most cases, instead using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.) This idea (along with the phrase "tiny Vecs are dumb") comes from the "doubling" growth strategy that was removed from `RawVec` in rust-lang#72013. That strategy was barely ever used -- only when a `VecDeque` was grown, oddly enough -- which is why it was removed in rust-lang#72013. (Fun fact: until just a few days ago, I thought the "doubling" strategy was used for repeated push case. In other words, this commit makes `Vec`s behave the way I always thought they behaved.) This change reduces the number of allocations done by rustc itself by 10% or more. It speeds up rustc, and will also speed up any other Rust program that uses `Vec`s a lot.
The final perf improvements are here. |
…nieu Tiny Vecs are dumb. Currently, if you repeatedly push to an empty vector, the capacity growth sequence is 0, 1, 2, 4, 8, 16, etc. This commit changes the relevant code (the "amortized" growth strategy) to skip 1 and 2, instead using 0, 4, 8, 16, etc. (You can still get a capacity of 1 or 2 using the "exact" growth strategy, e.g. via `reserve_exact()`.) This idea (along with the phrase "tiny Vecs are dumb") comes from the "doubling" growth strategy that was removed from `RawVec` in rust-lang#72013. That strategy was barely ever used -- only when a `VecDeque` was grown, oddly enough -- which is why it was removed in rust-lang#72013. (Fun fact: until just a few days ago, I thought the "doubling" strategy was used for repeated push case. In other words, this commit makes `Vec`s behave the way I always thought they behaved.) This change reduces the number of allocations done by rustc itself by 10% or more. It speeds up rustc, and will also speed up any other Rust program that uses `Vec`s a lot. In theory, the change could increase memory usage, but in practice it doesn't. It would be an unusual program where very small `Vec`s having a capacity of 4 rather than 1 or 2 would make a difference. You'd need a *lot* of very small `Vec`s, and/or some very small `Vec`s with very large elements. r? @Amanieu
#[inline] | ||
fn grow_if_necessary(&mut self) { | ||
#[inline(never)] | ||
fn grow(&mut self) { | ||
if self.is_full() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check is duplicated now, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree - a better change might have been to keep the inline function but have it call a never inlined grow_always instead. Should lead to the same code generation in the end.
In servo/servo#26713 (comment) I really don’t have a good sense of scale, does that sound like a lot? Given “we want it to be as small as possible” in the code comment. |
I have tried shrinking |
I understand it’s not easy, and I’m sure this PR has already improved things. I was wondering how reasonable 165 lines sounds, but maybe the easiest would be to look at those lines and see what they do. |
cargo-llvm-lines
shows that, in various benchmarks,RawVec::grow
isinstantiated 10s or 100s of times and accounts for 1-8% of lines of
generated LLVM IR.
This commit moves most of
RawVec::grow
into a separate function thatisn't parameterized by
T
, which means it doesn't need to beinstantiated many times. This reduces compile time significantly.
r? @ghost