-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] NRVO pass for acyclic CFGs #71003
Conversation
r? @eddyb (rust_highfive has picked a reviewer for you, use r? to override) |
Is this related to #47954 ? |
Yeah, it's a simpler version of that |
@slanterns Yeah, it's a simpler approximation - oops, @jonas-schievink already replied. Failure seems to be formatting related, so I'm going ahead with: |
Awaiting bors try build completion |
⌛ Trying commit 7c9604caf7dd0a32fc3384571aaf27f36a3cbba4 with merge 88171401698a07eb245f0c4ac15085e1128ec7a7... |
@bors try @rust-timer queue |
⌛ Trying commit cfaaae36b313af2db65274cffe3b5ca7d7275116 with merge d50710b6cf33db9d82496ae2032d7643605fb0e4... |
Looks like neither of the racing @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit cfaaae36b313af2db65274cffe3b5ca7d7275116 with merge c0ce9fadf4f57ef142f608c346e5369c80fc39f5... |
☀️ Try build successful - checks-azure |
Queued c0ce9fadf4f57ef142f608c346e5369c80fc39f5 with parent 9682f0e, future comparison URL. |
Finished benchmarking try commit c0ce9fadf4f57ef142f608c346e5369c80fc39f5, comparison URL. |
|
@bjorn3 To be clear, this run was for fun (since the approach is arguably bruteforce right now). #47954 initially had regressions like 320x and I remember even 9000x but can't confirm that because the perf data from back then is gone now. As for |
Wow, I'm impressed that building rustc with this optimization actually made it quite a bit faster in some cases! That makes me think making this optimization fast but approximate could already be a fairly big gain. |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Let's see how the performance of an NRVO-optimized rustc changes. @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 03e04dbe2e098ca3240a2049bc337bcc5f31f006 with merge bd02092214fcef215d0af94e0507e58e5adf802a... |
Instead of collecting all uses every time. Should be faster than before, but there's probably still things to improve.
@bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit fa95297 with merge f7f520c3eaac507bffefad8b932af835ed31a2b5... |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Try build successful - checks-azure |
Queued f7f520c3eaac507bffefad8b932af835ed31a2b5 with parent edc0258, future comparison URL. |
Finished benchmarking try commit f7f520c3eaac507bffefad8b932af835ed31a2b5, comparison URL. |
Well that's marginally less disastrous, but I've expected the optimization to make more of a difference. Maybe the expensive part is walking the CFG. Now let's actually only run it on rustc @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 2650c3a with merge d601b3227e7cb970a7dbdb29498252d6c133701a... |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
☀️ Try build successful - checks-azure |
Queued d601b3227e7cb970a7dbdb29498252d6c133701a with parent 351eefe, future comparison URL. |
So, yeah, quite a lot of stress indeed. It is basically the worst-case scenario for this pass, at least how I've written it. For every candidate assignment, the pass will walk the containing basic block forwards and backwards to find conflicting uses of the involved locals. Looking at the MIR it seems that there's one candidate assignment for every tuple constructed, so 65535 in total. For each of these, it'll iterate over all 458745 statements. My plans for speeding up this pass involve caching CFG walks, which does not matter at all in this case since the CFG is just one block in size. Not very encouraging. I suppose the most promising way to do this would be to go with the original approach of calculating a conflict matrix between relevant locals. For now, I'll close this PR though. I've learned a lot while implementing this. |
…n, r=oli-obk Reading from the return place is fine Const eval thinks that reading from local `_0` is UB, but it isn't. `_0` is just a normal local like any other, and codegen handles it that way too. The only special thing is that the `Return` terminator will read from it. I've hit these errors while working on an NRVO pass that can merge other locals with `_0` in rust-lang#71003. r? @oli-obk
Implement a generic Destination Propagation optimization on MIR This takes the work that was originally started by @eddyb in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed). The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed: * Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here. * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways. * Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this. * Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622). Some benchmarks of the pass were done in rust-lang#72635.
Implement a generic Destination Propagation optimization on MIR This takes the work that was originally started by `@eddyb` in rust-lang#47954, and then explored by me in rust-lang#71003, and implements it in a general (ie. not limited to acyclic CFGs) and dataflow-driven way (so that no additional infrastructure in rustc is needed). The pass is configured to run at `mir-opt-level=2` and higher only. To enable it by default, some followup work on it is still needed: * Performance needs to be evaluated. I did some light optimization work and tested against `tuple-stress`, which caused trouble in my last attempt, but didn't go much in depth here. * We can also enable the pass only at `opt-level=2` and higher, if it is too slow to run in debug mode, but fine when optimizations run anyways. * Debuginfo needs to be fixed after locals are merged. I did not look into what is required for this. * Live ranges of locals (aka `StorageLive` and `StorageDead`) are currently deleted. We either need to decide that this is fine, or if not, merge the variable's live ranges (or remove these statements entirely – rust-lang#68622). Some benchmarks of the pass were done in rust-lang#72635.
No description provided.