-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add incremental optimization levels #13464
Conversation
I suggest following option names:
I strongly believe, default mode should be with optimizations enabled, since it is most of users use first. Given it doesn't harm compilation time much and provides significant improvement to performance of resulting binary, I don't see why non-optimized mode should remain default. |
@funny-falcon Long time no see! |
src/compiler/crystal/compiler.cr
Outdated
builder.use_inliner_with_threshold = 275 | ||
when OptimizationMode::Level1 | ||
builder.opt_level = 1 | ||
builder.use_inliner_with_threshold = 150 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is other useful option to speedup compilation:
builder.disable_unroll_loops = true
But I don't know, how to match it in optimize_with_new_pass_manager
for newer LLVM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like, there is need to add LLVM::PassBuilderOptions#set_loop_unrolling which should map to LLVMPassBuilderOptionsSetLoopUnrolling
This is would be quite bad because create confusion, level2 here is not even close to gcc -O2. In gcc -O2 is very good level of optimization, but in crystal it would be much slower, because use separate module compilation (so no inlining). |
LLVM::CodeGenOptLevel::Less | ||
else | ||
LLVM::CodeGenOptLevel::None | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anybody know what for this opt_level, is it options for linker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Roughly speaking, this level is required for the codegen passes, whereas the other one is for the optimization passes
I propose use --level1 or --level2 instead of default, because for a project, the first time build time is always can be ignored, because, we only need it to be do once, right? But, we should keep old default mode for user use it manually. |
I like idea to use level1 as default. minuses:
pluses:
|
Still no progress? pitty |
why not merge it? it not change current compilation (default and --release), only add more options for build customization. |
Yeah, quite strange unwilling to improve user experience. |
I'm sorry this has been sitting for so long. It's not unwillingness. There's a lot of review work and limited resources. Sometimes PRs fall through the crack. 😢 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overal! I have some suggestions for small improvements.
And merge conflicts need to be resolved via git merge master
.
src/compiler/crystal/compiler.cr
Outdated
current_bc_flags = "#{@codegen_target}|#{@mcpu}|#{@mattr}|#{@release}|#{@link_flags}|#{@mcmodel}" | ||
bc_flags_filename = "#{output_dir}/bc_flags" | ||
current_bc_flags = "#{@codegen_target}|#{@mcpu}|#{@mattr}|#{@link_flags}|#{@mcmodel}" | ||
bc_flags_filename = "#{output_dir}/bc_flags#{optimization_mode_suffix}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Putting the optimization level in the filename seems like a great idea. It changes from the current behaviour where it's only written in the file contents.
I'm wondering about the effects of this.
I suppose it means the caches for different optimization modes won't override each other.
Does it mean different caches stay around? But what about the actual data files in output_dir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, all objects files was mixed, but release used only 2 files: _main.o and _main.bc, so mixing was not so important. But in this PR added many modes, so every mode have separate compiled objects(.o and .bc) for each class.
About the actual data, still this is only cache files, its became useless, it can be removed just with rm -rf ~/.cache/crystal
. And of course all caches for each mode would stay around. But in every day usage this cache would be the similar as before this PR, because default compile - would generate .o0 files, and release would generate just 2 files _main.o.o3 _main.bc.o3
src/compiler/crystal/compiler.cr
Outdated
@@ -755,7 +810,7 @@ module Crystal | |||
end | |||
|
|||
if must_compile | |||
compiler.optimize llvm_mod if compiler.release? | |||
compiler.optimize llvm_mod if compiler.optimization_mode != OptimizationMode::O0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Predicate method is type safe and more concise:
compiler.optimize llvm_mod if compiler.optimization_mode != OptimizationMode::O0 | |
compiler.optimize llvm_mod unless compiler.optimization_mode.o0? |
@@ -145,7 +145,7 @@ class Crystal::Program | |||
|
|||
# Although release takes longer, once the bc is cached in .crystal | |||
# the subsequent times will make program execution faster. | |||
host_compiler.release = true | |||
host_compiler.release! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: According to #13505 (comment) ff. it seems to be more efficient to build macros not in a single module.
It's probably best to leave the current behaviour in place here. We can follow up with a change to macro generation config. Since the host compiler inherits its configuration from the target compiler there's more involved than just switching this to host_compiler.optimization_mode = :03
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about changing this, because this is for compile run
macroses, as i understand. Which should have fast runtime, and this is done only once. So release! here is in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostya macroses should run "fast-enough". Compiled without "single-module" is certainly fast enough for macroses, since they are not computation heavy. I doubt you could measure difference, I bet case of beer on it. But delta of time consumed by compilation of macroses is certainly measurable.
rebased, squashed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
P.S. Next time, please do not force push. Just merge and amend new commits. That makes reviews easier. Thanks. 🙏 (ref: https://github.com/crystal-lang/crystal/blob/master/CONTRIBUTING.md#making-good-pull-requests).
Cool. |
UPDATE: Revised description reflecting the actually merged changes (by @straight-shoota):
Adds four distinct optimization levels:
-O0
: No optimization-O1
: Low optimization-O2
: Middle optimization-O3
: High optimizationEach level activates the respective LLVM
RunPasses
andCodeGenOptLevel
optimizations.-O3
corresponds to the existing release mode and-O0
corresponds to the default non-release mode.-O0
remains the default and--release
is equivalent to-O3 --single-module
.Effectively, this introduces two optimization choices between the previous full or nothing. And it's now possible to use high optimization without
--single-module
.Each optimization level increasingly trades compile time performance for runtime performance. The exact effect depends on individual code bases. But in general, even slight optimizations can significantly improve runtime performance with barely noticable impact at compile time.
When using any kind of optimizations
--single-module
probably has the biggest effect on both compile and runtime performance. It enables optimizations across module boundaries but makes it impossible to generate modules in parallel.Original Post:
This is just draft, for discussion. Here i add two new levels of optimization (--level2 and --level1) to existing ones
--release
anddefault
.Reason: I have quite big project which is compiled very slow, and also have big start time (it read data from db, and do some heavy calculations on it). So if i compile it without optimization it starts very slow, if i compile it with --release, compilation take too long. So debugging such project is real pain. By adding new incremental optimization level, i can solve this problem.
Of course this level2, and level1 optimization not even close in terms of performance to
--release
option, because it optimize every module (which is class in crystal), unlike--release
which optimize united module (using hard inlining). But it fast enough to debug my project.This is results for my project:
--release
: initial compile: 61s, incremental compile(change 1 file): 61s, start time: 2s--level2
: initial compile: 12.6s, incremental compile(change 1 file): 5.3s, start time: 7s--level1
: initial compile: 12s, incremental compile(change 1 file): 5.2s, start time: 7.5sdefault
: initial compile: 7,4s, incremental compile(change 1 file): 5.3s, start time: 23.5sBuild crystal compiler (run time here is recompile crystal by new binary with clean cache and level0, to reduce llvm interference):
--release
: initial compile: 7m38,818s, incremental compile(change 1 file): 7m34,734s, run time: 0m36,086s--level2
: initial compile: 1m5,084s, incremental compile(change 1 file): 0m22,129s, run time: 0m42,691s--level1
: initial compile: 1m1,384s, incremental compile(change 1 file): 0m22,106s, run time: 0m42,980sdefault
: initial compile: 0m27,484s, incremental compile(change 1 file): 0m22,071s, run time: 0m55,572s