You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original issue provides a great introduction to and overview of the problem, as well as a proposed solution to it; but there are relevant issues and detail missing from the issue that I would like to add, and I would like to bring it up to date and
organize it so the issue can be addressed more cohesively to form a long-term solution.
I'd like to thank @tgnottingham for all the posts and work he did on this, as they gave me a lot of insight into the overall issue and it's resolution. I'd also like to mention @wesleywiser for providing an apt comparison
to the multiway partitioning problem. The general solution lies in minimizing the "make span" of the problem (CGU compilation finish time), which can be achieved by minimizing the largest "sum" (CGU).
However, before we can attempt to minimize the largest CGU we must address all of the issues that are currently plagueing CGU size in general. Monomorphization partitioning is the obvious candidate for consideration; and it's possible the issues regarding inlining and drops take place there. These issues may also be related. I am assuming that a lot of programs use match statements to initialize struct fields; so the code bloat there is something that should be looked into as well.
More importantly, there is the fact that while CGU scheduling is correctly organized by size, the metric used to gauge CGU size is supposedly not correct; which suggests that the first order of business is making it so. I plan on tackling this first, so any advice that can be offered in regards to that would be appreciated. Immensely.
Thoughts
While it is possible that small improvements can be made over time to yield performance wins, these tweaks may not be be cohesive or consider other necessary parts of the compiler, which is why to address this effectively we need to organize all relevant information into one place, and create a roadmap that resolves these issues and gets us to the high-level general solution of simply minimizing the largest CGU. In general, we need collaboration.
Metrics
We need a way to track the metrics we are looking to optimize. In general, we need to track different size and time
metrics for individual CGU's from when they are initially partitioned, to when they are ready to be linked. We also need metrics for codegen in general, so we can see how size/time for individual CGU's affects codegen overall. Right now the
only metric I have thought of for tracking this is total time from partitioning to linking. We also need to be able to determine if a CGU is large because it started out large, or if it is large because it has been merged with other CGU's.
Metric Types:
CGU: metrics for individual CGU's
CG: metrics for codegen in general
Type
Metric
Description
CGU
size
Total LLVM instructions
CGU
compilation time
Total time to translate LLVM-IR to a compiled CGU
CGU
optimization time
Total time to completely optimize a CGU
CGU
total time
compilation time + optimization time
CG
total time
Total time from partitioning to linking (when all compiled CGU's are ready to be linked, not when linking has finished)
Road to Optimization
This is not permanent; I am hoping others offer suggestions and improvements.
Add per-CGU metrics to perf.r-l.o
Map CGU's to a CGU ID
Correct size metric; track CGU size
Add support for determining if CGU's are large because they were
merged or if they began that way
Should the metric simply be the number of LLVM instructions in a CGU?
Address inlining implementation
Going to need a lot of help here
Address Drops always being inlined
And here
Address LLVM bloat when using match struct-field initialization
And here
Implement algorithm to identify largest CGU and minimize it
Should the amount of CGU's minimized be determined by the amount of cores?
Thanks for @-ing me, @Sl1mb0. I've been revisiting this very subject lately. I'll try to add my thoughts and/or open some issues that I can link to this when I have the time.
Updated tracking issue for #82685
Overview
The original issue provides a great introduction to and overview of the problem, as well as a proposed solution to it; but there are relevant issues and detail missing from the issue that I would like to add, and I would like to bring it up to date and
organize it so the issue can be addressed more cohesively to form a long-term solution.
I'd like to thank @tgnottingham for all the posts and work he did on this, as they gave me a lot of insight into the overall issue and it's resolution. I'd also like to mention @wesleywiser for providing an apt comparison
to the multiway partitioning problem. The general solution lies in minimizing the "make span" of the problem (CGU compilation finish time), which can be achieved by minimizing the largest "sum" (CGU).
However, before we can attempt to minimize the largest CGU we must address all of the issues that are currently plagueing CGU size in general. Monomorphization partitioning is the obvious candidate for consideration; and it's possible the issues regarding inlining and drops take place there. These issues may also be related. I am assuming that a lot of programs use match statements to initialize struct fields; so the code bloat there is something that should be looked into as well.
More importantly, there is the fact that while CGU scheduling is correctly organized by size, the metric used to gauge CGU size is supposedly not correct; which suggests that the first order of business is making it so. I plan on tackling this first, so any advice that can be offered in regards to that would be appreciated. Immensely.
Thoughts
While it is possible that small improvements can be made over time to yield performance wins, these tweaks may not be be cohesive or consider other necessary parts of the compiler, which is why to address this effectively we need to organize all relevant information into one place, and create a roadmap that resolves these issues and gets us to the high-level general solution of simply minimizing the largest CGU. In general, we need collaboration.
Metrics
We need a way to track the metrics we are looking to optimize. In general, we need to track different size and time
metrics for individual CGU's from when they are initially partitioned, to when they are ready to be linked. We also need metrics for codegen in general, so we can see how size/time for individual CGU's affects codegen overall. Right now the
only metric I have thought of for tracking this is total time from partitioning to linking. We also need to be able to determine if a CGU is large because it started out large, or if it is large because it has been merged with other CGU's.
Metric Types:
Road to Optimization
This is not permanent; I am hoping others offer suggestions and improvements.
merged or if they began that way
References
Original Issue
CGU Size Metric
CGU Organization
CGU Compile Time
LLVM Bloat Match Statement
Inline Implementation
Drops Always Inlined
Drops In LLVM IR
Suboptimal Codegen Parallelism
Wesley Wiser Comparison
Multiway Partitioning
The text was updated successfully, but these errors were encountered: