-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rollup of 19 pull requests #50608
Closed
Closed
Rollup of 19 pull requests #50608
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes rust-lang#50438. I'll make this more robust later for rust-lang#49815.
Because they are simple and hot. This change speeds up some incremental runs of a few rustc-perf benchmarks, the best by 3%.
This avoids a decent number of allocations, enough to speed up incremental runs of many rustc-benchmarks, the best by 2%.
This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
also make a drive-by typo fix
When the RawVec::try_reserve* methods were added, they took the place of the ::reserve* methods in the source file, and new ::reserve* methods wrapping the new try_reserve* methods were created. But the documentation didn't move along, such that: - reserve_* methods are barely documented. - try_reserve_* methods have unmodified documentation from reserve_*, such that their documentation indicate they are panicking/aborting. This moves the documentation back to the right methods, with a placeholder documentation for the try_reserve* methods.
…ducts, r=michaelwoerister Make DepGraph::previous_work_products immutable Fixes rust-lang#50501 r? @michaelwoerister
… r=Zoxc Don't use Lock for heavily accessed CrateMetadata::cnum_map. The `cnum_map` in `CrateMetadata` is used for two things: 1. to map `CrateNums` between crates (used a lot during decoding) 2. to construct the (reverse) post order of the crate graph For the second case, we need to modify the map after the fact, which is why the map is wrapped in a `Lock`. This is bad for the first case, which does not need the modification and does lots of small reads from the map. This PR splits case (2) out into a separate `dependencies` field. This allows to make the `cnum_map` immutable (and shifts the interior mutability to a less busy data structure). Fixes rust-lang#50502 r? @Zoxc
Make CrateNum allocation more thread-safe. This PR makes sure that we can't have race conditions when assigning CrateNums. It's a slight improvement but a larger refactoring of the CrateStore/CrateLoader infrastructure would be good, I think. r? @Zoxc
…petrochenkov Inline `Span` methods. Because they are simple and hot. This change speeds up some incremental runs of a few rustc-perf benchmarks, the best by 3%. Here are the ones with a speedup of at least 1%: ``` coercions avg: -1.1% min: -3.4% max: -0.2% html5ever-opt avg: -0.8% min: -1.7% max: -0.2% clap-rs-check avg: -0.3% min: -1.4% max: 0.7% html5ever avg: -0.7% min: -1.2% max: -0.4% html5ever-check avg: -0.9% min: -1.1% max: -0.8% clap-rs avg: -0.4% min: -1.1% max: -0.1% crates.io-check avg: -0.8% min: -1.0% max: -0.6% serde-opt avg: -0.6% min: -1.0% max: -0.3% ```
…elwoerister Use SmallVec for DepNodeIndex within dep_graph. This avoids a decent number of allocations, enough to speed up incremental runs of many rustc-benchmarks, the best by 2%. Here are the rustc-perf benchmarks that showed an improvement of at least 1% on one run: ``` unused-warnings-check avg: -1.7% min: -2.4% max: 0.0% unused-warnings-opt avg: -1.4% min: -2.0% max: 0.0% unused-warnings avg: -1.4% min: -2.0% max: -0.0% tokio-webpush-simple-check avg: -1.0% min: -1.7% max: 0.0% futures-opt avg: -0.9% min: -1.6% max: 0.0% encoding avg: -1.2% min: -1.6% max: -0.6% encoding-check avg: -0.9% min: -1.6% max: 0.0% encoding-opt avg: -0.8% min: -1.6% max: -0.1% futures avg: -0.9% min: -1.5% max: 0.0% futures-check avg: -0.9% min: -1.5% max: 0.1% regression-31157-check avg: -0.9% min: -1.5% max: 0.0% regex avg: -0.6% min: -1.4% max: 0.0% regression-31157-opt avg: -0.5% min: -1.4% max: 0.1% regression-31157 avg: -0.7% min: -1.4% max: 0.2% regex-opt avg: -0.6% min: -1.4% max: 0.1% hyper-check avg: -0.8% min: -1.4% max: -0.1% regex-check avg: -1.0% min: -1.4% max: 0.0% hyper-opt avg: -0.7% min: -1.4% max: -0.1% hyper avg: -0.7% min: -1.3% max: 0.1% piston-image-opt avg: -0.4% min: -1.3% max: 0.0% tokio-webpush-simple-opt avg: -0.3% min: -1.3% max: 0.0% piston-image-check avg: -0.5% min: -1.3% max: -0.0% syn-opt avg: -0.5% min: -1.3% max: 0.0% clap-rs-check avg: -0.3% min: -1.3% max: 0.2% piston-image avg: -0.5% min: -1.2% max: 0.1% syn avg: -0.5% min: -1.2% max: 0.1% syn-check avg: -0.6% min: -1.2% max: -0.1% issue-46449-opt avg: -0.4% min: -1.2% max: -0.1% parser-check avg: -0.7% min: -1.2% max: 0.1% issue-46449 avg: -0.5% min: -1.2% max: -0.0% ```
…r=alexcrichton Allow for specifying a linker plugin for cross-language LTO This PR makes the `-Zcross-lang-lto` flag optionally take the path to the `LLVMgold.so` linker plugin. If this path is specified, `rustc` will invoke the linker with the correct arguments (i.e. `-plugin` and various `-plugin-opt`s). This can be used to ergonomically enable cross-language LTO for Rust programs with C/C++ dependencies: ``` clang -O2 test.c -otest.o -c -flto=thin llvm-ar -rv libxxx.a test.o rustc -L. main.rs -Zcross-lang-lto=/usr/lib64/LLVMgold.so -O -Clink-arg=-fuse-ld=gold ``` - Note that in theory this should work with Gold, LLD, and newer versions of binutils' LD but on my current system I could only get it to work with Gold. - Also note that this will work best if the Clang version and Rust's LLVM version are close enough. Clang 6.0 works well with the current nightly. r? @alexcrichton
Clarify in the docs that `mul_add` is not always faster. Fixes rust-lang#49842. Other resources: - https://users.rust-lang.org/t/why-does-the-mul-add-method-produce-a-more-accurate-result-with-better-performance/1626 - https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation
Don't require clippy/miri for beta r? @kennytm cc @alexcrichton I'm trying this out locally atm to see if it works as I think it should. Not sure how to test it for real except wait for the next beta. fixes rust-lang#50557
…SimonSapin add fn `into_inner(self) -> (Idx, Idx)` to RangeInclusive (rust-lang#49022) adds `into_inner(self) -> (Idx, Idx)` to RangeInclusive rust-lang#49022 (comment)
…fackler std: Avoid `ptr::copy` if unnecessary in `vec::Drain` This commit is spawned out of a performance regression investigation in rust-lang#50496. In tracking down this regression it turned out that the `expand_statements` function in the compiler was taking quite a long time. Further investigation showed two key properties: * The function was "fast" on glibc 2.24 and slow on glibc 2.23 * The hottest function was memmove from glibc Combined together it looked like glibc gained an optimization to the memmove function in 2.24. Ideally we don't want to rely on this optimization, so I wanted to dig further to see what was happening. The hottest part of `expand_statements` was `Drop for Drain` in the call to `splice` where we insert new statements into the original vector. This *should* be a cheap operation because we're draining and replacing iterators of the exact same length, but under the hood memmove was being called a lot, causing a slowdown on glibc 2.23. It turns out that at least one of the optimizations in glibc 2.24 was that `memmove` where the src/dst are equal becomes much faster. [This program][prog] executes in ~2.5s against glibc 2.23 and ~0.3s against glibc 2.24, exhibiting how glibc 2.24 is optimizing `memmove` if the src/dst are equal. And all that brings us to what this commit itself is doing. The change here is purely to `Drop for Drain` to avoid the call to `ptr::copy` if the region being copied doesn't actually need to be copied. For normal usage of just `Drain` itself this check isn't really necessary, but because `Splice` internally contains `Drain` this provides a nice speed boost on glibc 2.23. Overall this should fix the regression seen in rust-lang#50496 on glibc 2.23 and also fix the regression on Windows where `memmove` looks to not have this optimization. Note that the way `splice` was called in `expand_statements` would cause a quadratic number of elements to be copied via `memmove` which is likely why the tuple-stress benchmark showed such a severe regression. Closes rust-lang#50496 [prog]: https://gist.github.com/alexcrichton/c05bc51c6771bba5ae5b57561a6c1cd3
… r=frewsxcv Move "See also" disambiguation links for primitive types to top Closes rust-lang#50384. <details> <summary>Images</summary> ![rust-slice](https://user-images.githubusercontent.com/1411280/39843148-caa41c3e-53b7-11e8-8123-b57c25a4d9e0.png) ![rust-isize](https://user-images.githubusercontent.com/1411280/39843146-ca94b384-53b7-11e8-85f3-3f5e5d353a05.png) </details> r? @steveklabnik
Restore RawVec::reserve* documentation When the RawVec::try_reserve* methods were added, they took the place of the ::reserve* methods in the source file, and new ::reserve* methods wrapping the new try_reserve* methods were created. But the documentation didn't move along, such that: - reserve_* methods are barely documented. - try_reserve_* methods have unmodified documentation from reserve_*, such that their documentation indicate they are panicking/aborting. This moves the documentation back to the right methods, with a placeholder documentation for the try_reserve* methods.
…ichaelwoerister Remove unnecessary mutable borrow and resizing in DepGraph::serialize I might be mistaken, but I noticed this whilst in this file for something else. It appears that this mutable borrow is unnecessary and since it's locking it should be removed. The resizing looks redundant since nothing additional is added to the fingerprints in this function, so that can also be removed.
…richton Retry when downloading the Docker cache. As a safety measure, prevent spuriously needing to rebuild the docker image in case the network was reset while downloading. Also, adjusted the retry function to insert a sleep between retries, because retrying immediately will often just hit the same issue.
r? @cramertj (rust_highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
label
May 10, 2018
@bors: r+ p=10 |
📌 Commit bc516e1 has been approved by |
bors
added
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
May 10, 2018
🔒 Merge conflict |
bors
added
S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
and removed
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
labels
May 10, 2018
☔ The latest upstream changes (presumably #50395) made this pull request unmergeable. Please resolve the merge conflicts. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
rollup
A PR which is a rollup
S-waiting-on-author
Status: This is awaiting some action (such as code changes or more information) from the author.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Successful merges:
Span
methods. #50564 (InlineSpan
methods.)mul_add
is not always faster. #50572 (Clarify in the docs thatmul_add
is not always faster.)into_inner(self) -> (Idx, Idx)
to RangeInclusive (#49022) #50574 (add fninto_inner(self) -> (Idx, Idx)
to RangeInclusive (Tracking issue forstart
,end
andnew
methods of RangeInclusive #49022))ptr::copy
if unnecessary invec::Drain
#50575 (std: Avoidptr::copy
if unnecessary invec::Drain
)Failed merges: