-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make mk_attr_id
part of ParseSess
#101313
Conversation
r? @fee1-dead (rust-highfive has picked a reviewer for you, use r? to override) |
compiler/rustc_ast/src/attr/mod.rs
Outdated
// starting value of AttrId in each worker thread. | ||
// The `index` is the index of the worker thread. | ||
// This ensures that the AttrId generated in each thread is unique. | ||
AttrIdGenerator(WorkerLocal::new(|index| Cell::new((index as u32).reverse_bits()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with parallel compiler code, but is there a cap as to how many threads can be used for parallel rustc? If there are too many then the actual usable bits would be decreased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is a cap. AFAIK, the number of threads is usually no more than 64. Besides, as the number of threads increases, the number of AttrIds that need to be assigned per thread will decreases relatively, so I don't think this is a problem.
I understand that the main motivation is performance in parallel environment. Do you have a measurement of the perf improvement? |
I haven't learned about a good benchmark for measuring the efficiency of parallel compilation, which is also in my follow-up implementation plan. I guess for now we just need to guarantee that there is no negative impact on the efficiency in non-parallel mode. So I think we can run rustc perf directly. |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit f9234777e44bde1b27786e9f4300b0ee3323c5f9 with merge 7e04d9038009c47e6b24a62aab7fa9d31c71706a... |
☀️ Try build successful - checks-actions |
Queued 7e04d9038009c47e6b24a62aab7fa9d31c71706a with parent 5b4bd15, future comparison URL. |
Finished benchmarking commit (7e04d9038009c47e6b24a62aab7fa9d31c71706a): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
From the perf report, this PR looks like a great idea to improve perf. |
Sure, I added the corresponding modifications. |
@bors r=cjgillot |
@SparrowLii: 🔑 Insufficient privileges: Not in reviewers |
@cjgillot It looks like I don't have the privilege to r= |
☔ The latest upstream changes (presumably #101757) made this pull request unmergeable. Please resolve the merge conflicts. |
87cd6ff
to
bfc4f2e
Compare
@bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (750bd1a): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Footnotes |
It's possible that |
Thanks, your suggestion is very valuable! In fact, we don't have good tools for measuring the performance of compilers in parallel environments at current. I will try this tool then! |
This happened to break the parallel compiler due to |
You mean Or we should add this |
I think it just ends up spawning the global Rayon thread pool. I kind of have a workaround in #107782, but it's not particularly clean. |
Updates #48685
The current
mk_attr_id
uses theAtomicU32
type, which is not very efficient and adds a lot of lock contention in a parallel environment.This PR refers to the task list in #48685, uses
mk_attr_id
as a method of theAttrIdGenerator
struct, and adds a new fieldattr_id_generator
toParseSess
.AttrIdGenerator
uses theWorkerLocal
, which has two advantages: 1.Cell
is more efficient thanAtomicU32
, and does not increase any lock contention. 2. We put the index of the work thread in the first few bits of the generatedAttrId
, so that theAttrId
generated in different threads can be easily guaranteed to be unique.cc @cjgillot