-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Sharded
an enum and specialize it for the single thread case
#114860
Conversation
r? @b-naber (rustbot has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit c737c62 with merge c2631968523852a0815358b7f3e17eb91a677173... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (c2631968523852a0815358b7f3e17eb91a677173): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 632.472s -> 632.68s (0.03%) |
/// a single shard to be used for greater cache efficiency. | ||
/// A single field is used when the compiler uses only one thread. | ||
pub enum Sharded<T> { | ||
Single(Lock<T>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use RefCell
directly if it brought visible performeance benefit. Then we need to add Sharded
to marker.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work as we need to return a single type for both cases. With #111713 landed however we can add a lock operation to Sharded
that combines the branches on is_dyn_thread_safe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea I mean Single(RefCell<T>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh fn lock_shards
requires the return type Lock<T>
. Just forget it :)
This comment has been minimized.
This comment has been minimized.
Thanks! It looks good to me. cc @nnethercote @cjgillot |
use std::borrow::Borrow; | ||
use std::collections::hash_map::RawEntryMut; | ||
use std::hash::{Hash, Hasher}; | ||
use std::mem; | ||
|
||
#[cfg(parallel_compiler)] | ||
// 32 shards is sufficient to reduce contention on an 8-core Ryzen 7 1700, | ||
// but this should be tested on higher core count CPUs. How the `Sharded` type gets used | ||
// may also affect the ideal number of shards. | ||
const SHARD_BITS: usize = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still used by get_shard_hash
. Should it keep both possible values? Or should get_shard_hash
be refactored out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_shard_hash
isn't actually used for anything with cfg(not(parallel_compiler))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, should it be cfg(parallel_compiler)
to avoid accidental use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is still used with cfg(not(parallel_compiler))
, but it's value doesn't end up used with optimizations applied as get_shard_by_index
ignores the hash then.
@bors r+ |
⌛ Testing commit 0823f0c with merge 9cd8d80c729f199c56bac370baa2684a7235ae8e... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
That test failure looks unrelated to this PR. It's also ignored on MSVC. I wonder why MINGW is left out? |
It uses different debuginfo and unwinding so often unaffected by many MSVC issues regarding unwinding. |
Let's do a retry to see if the test is flaky? |
@bors r+ |
💡 This pull request was already approved, no need to approve it again.
|
☀️ Test successful - checks-actions |
Finished benchmarking commit (840ed5d): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 636.596s -> 632.475s (-0.65%) |
This changes
Sharded
to use a single shard by an enum, reducing the size ofSharded
for greater cache efficiency.Performance improvement with 1 thread and
cfg(parallel_compiler)
:I did see an unexpected 0.23% change for the serial compiler, so this could use a perf run to see if that reproduces.
cc @SparrowLii