Optimize codegen scheduling #89281

Sl1mb0 · 2021-09-26T18:52:17Z

Updated tracking issue for #82685

Overview

The original issue provides a great introduction to and overview of the problem, as well as a proposed solution to it; but there are relevant issues and detail missing from the issue that I would like to add, and I would like to bring it up to date and
organize it so the issue can be addressed more cohesively to form a long-term solution.

I'd like to thank @tgnottingham for all the posts and work he did on this, as they gave me a lot of insight into the overall issue and it's resolution. I'd also like to mention @wesleywiser for providing an apt comparison
to the multiway partitioning problem. The general solution lies in minimizing the "make span" of the problem (CGU compilation finish time), which can be achieved by minimizing the largest "sum" (CGU).

However, before we can attempt to minimize the largest CGU we must address all of the issues that are currently plagueing CGU size in general. Monomorphization partitioning is the obvious candidate for consideration; and it's possible the issues regarding inlining and drops take place there. These issues may also be related. I am assuming that a lot of programs use match statements to initialize struct fields; so the code bloat there is something that should be looked into as well.

More importantly, there is the fact that while CGU scheduling is correctly organized by size, the metric used to gauge CGU size is supposedly not correct; which suggests that the first order of business is making it so. I plan on tackling this first, so any advice that can be offered in regards to that would be appreciated. Immensely.

Thoughts

While it is possible that small improvements can be made over time to yield performance wins, these tweaks may not be be cohesive or consider other necessary parts of the compiler, which is why to address this effectively we need to organize all relevant information into one place, and create a roadmap that resolves these issues and gets us to the high-level general solution of simply minimizing the largest CGU. In general, we need collaboration.

Metrics

We need a way to track the metrics we are looking to optimize. In general, we need to track different size and time
metrics for individual CGU's from when they are initially partitioned, to when they are ready to be linked. We also need metrics for codegen in general, so we can see how size/time for individual CGU's affects codegen overall. Right now the
only metric I have thought of for tracking this is total time from partitioning to linking. We also need to be able to determine if a CGU is large because it started out large, or if it is large because it has been merged with other CGU's.

Metric Types:

CGU: metrics for individual CGU's
CG: metrics for codegen in general

Type	Metric	Description
CGU	size	Total LLVM instructions
CGU	compilation time	Total time to translate LLVM-IR to a compiled CGU
CGU	optimization time	Total time to completely optimize a CGU
CGU	total time	compilation time + optimization time
CG	total time	Total time from partitioning to linking (when all compiled CGU's are ready to be linked, not when linking has finished)

Road to Optimization

This is not permanent; I am hoping others offer suggestions and improvements.

Add per-CGU metrics to perf.r-l.o
- Map CGU's to a CGU ID
- Correct size metric; track CGU size
- Add support for determining if CGU's are large because they were
  merged or if they began that way
- Should the metric simply be the number of LLVM instructions in a CGU?
Address inlining implementation
- Going to need a lot of help here
Address Drops always being inlined
- And here
Address LLVM bloat when using match struct-field initialization
- And here
Implement algorithm to identify largest CGU and minimize it
- Should the amount of CGU's minimized be determined by the amount of cores?

References

Original Issue
CGU Size Metric
CGU Organization
CGU Compile Time
LLVM Bloat Match Statement
Inline Implementation
Drops Always Inlined
Drops In LLVM IR
Suboptimal Codegen Parallelism
Wesley Wiser Comparison
Multiway Partitioning

bjorn3 · 2021-09-26T20:18:21Z

Updated issue description to fix the checkboxes. (There needs to be a space between [ and ])

tgnottingham · 2021-09-27T21:24:52Z

Thanks for @-ing me, @Sl1mb0. I've been revisiting this very subject lately. I'll try to add my thoughts and/or open some issues that I can link to this when I have the time.

clubby789 · 2023-03-30T21:24:41Z

@rustbot label +C-tracking-issue +A-codegen

Sl1mb0 mentioned this issue Sep 27, 2021

Uniquely identify CGU's and accurately track their size and merges #89308

Open

3 tasks

csmoe mentioned this issue Nov 9, 2022

Add module instruction stats #104178

Closed

rustbot added A-codegen Area: Code generation C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Mar 30, 2023

Dylan-DPC mentioned this issue Mar 1, 2024

Optimize codegen scheduling for memory usage and compile time #82685

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize codegen scheduling #89281

Optimize codegen scheduling #89281

Sl1mb0 commented Sep 26, 2021 •

edited

Loading

bjorn3 commented Sep 26, 2021

tgnottingham commented Sep 27, 2021

clubby789 commented Mar 30, 2023

Optimize codegen scheduling #89281

Optimize codegen scheduling #89281

Comments

Sl1mb0 commented Sep 26, 2021 • edited Loading

Overview

Thoughts

Metrics

Road to Optimization

References

bjorn3 commented Sep 26, 2021

tgnottingham commented Sep 27, 2021

clubby789 commented Mar 30, 2023

Sl1mb0 commented Sep 26, 2021 •

edited

Loading