Add incremental optimization levels #13464

kostya · 2023-05-12T21:20:01Z

UPDATE: Revised description reflecting the actually merged changes (by @straight-shoota):

Adds four distinct optimization levels:

-O0: No optimization
-O1: Low optimization
-O2: Middle optimization
-O3: High optimization

Each level activates the respective LLVM RunPasses and CodeGenOptLevel optimizations.

-O3 corresponds to the existing release mode and -O0 corresponds to the default non-release mode. -O0 remains the default and --release is equivalent to -O3 --single-module.

Effectively, this introduces two optimization choices between the previous full or nothing. And it's now possible to use high optimization without --single-module.

Each optimization level increasingly trades compile time performance for runtime performance. The exact effect depends on individual code bases. But in general, even slight optimizations can significantly improve runtime performance with barely noticable impact at compile time.
When using any kind of optimizations --single-module probably has the biggest effect on both compile and runtime performance. It enables optimizations across module boundaries but makes it impossible to generate modules in parallel.

Original Post:

This is just draft, for discussion. Here i add two new levels of optimization (--level2 and --level1) to existing ones --release and default.
Reason: I have quite big project which is compiled very slow, and also have big start time (it read data from db, and do some heavy calculations on it). So if i compile it without optimization it starts very slow, if i compile it with --release, compilation take too long. So debugging such project is real pain. By adding new incremental optimization level, i can solve this problem.

Of course this level2, and level1 optimization not even close in terms of performance to --release option, because it optimize every module (which is class in crystal), unlike --release which optimize united module (using hard inlining). But it fast enough to debug my project.

This is results for my project:
--release: initial compile: 61s, incremental compile(change 1 file): 61s, start time: 2s
--level2: initial compile: 12.6s, incremental compile(change 1 file): 5.3s, start time: 7s
--level1: initial compile: 12s, incremental compile(change 1 file): 5.2s, start time: 7.5s
default: initial compile: 7,4s, incremental compile(change 1 file): 5.3s, start time: 23.5s

Build crystal compiler (run time here is recompile crystal by new binary with clean cache and level0, to reduce llvm interference):
--release: initial compile: 7m38,818s, incremental compile(change 1 file): 7m34,734s, run time: 0m36,086s
--level2: initial compile: 1m5,084s, incremental compile(change 1 file): 0m22,129s, run time: 0m42,691s
--level1: initial compile: 1m1,384s, incremental compile(change 1 file): 0m22,106s, run time: 0m42,980s
default: initial compile: 0m27,484s, incremental compile(change 1 file): 0m22,071s, run time: 0m55,572s

Blacksmoke16 · 2023-05-12T21:22:05Z

Related: https://forum.crystal-lang.org/t/faster-release-compile-times-but-slightly-worse-performance/3864

funny-falcon · 2023-05-12T22:04:36Z

I suggest following option names:

--no-opt - matches to current <default> mode (and --level0 in 099c080 )
<default> - optimization with O1 and separate/incremental compilation (--level1)
--opt - optimization with O2 and separate/incremental compilation (--level2)
--release - remains same: O3 and "single_module" compilation

I strongly believe, default mode should be with optimizations enabled, since it is most of users use first. Given it doesn't harm compilation time much and provides significant improvement to performance of resulting binary, I don't see why non-optimized mode should remain default.

src/compiler/crystal/compiler.cr

Sija · 2023-05-12T22:28:31Z

@funny-falcon Long time no see!

funny-falcon · 2023-05-12T22:48:09Z

src/compiler/crystal/compiler.cr

+          builder.use_inliner_with_threshold = 275
+        when OptimizationMode::Level1
+          builder.opt_level = 1
+          builder.use_inliner_with_threshold = 150


There is other useful option to speedup compilation:

builder.disable_unroll_loops = true

But I don't know, how to match it in optimize_with_new_pass_manager for newer LLVM.

Looks like, there is need to add LLVM::PassBuilderOptions#set_loop_unrolling which should map to LLVMPassBuilderOptionsSetLoopUnrolling

src/compiler/crystal/command.cr

kostya · 2023-05-13T05:07:19Z

@jkthorne

What about more following the optimization levels of compilers like GCC and LLVM?

Instead of "--level2" it would be "-O2"?

This is would be quite bad because create confusion, level2 here is not even close to gcc -O2. In gcc -O2 is very good level of optimization, but in crystal it would be much slower, because use separate module compilation (so no inlining).

kostya · 2023-05-13T08:38:47Z

src/compiler/crystal/codegen/target.cr

+                  LLVM::CodeGenOptLevel::Less
+                else
+                  LLVM::CodeGenOptLevel::None
+                end


anybody know what for this opt_level, is it options for linker?

Roughly speaking, this level is required for the codegen passes, whereas the other one is for the optimization passes

zw963 · 2023-05-13T09:21:20Z

I propose use --level1 or --level2 instead of default, because for a project, the first time build time is always can be ignored, because, we only need it to be do once, right?

But, we should keep old default mode for user use it manually.

kostya · 2023-05-14T09:05:32Z

I like idea to use level1 as default.

minuses:

1.5-2 times slower initial compile (done once). For those who use crystal for scripting need to use --level0 option manually.
less backtrace, need to compile with --debug option, to get same backtrace as default. (adding this option make compilation slower by 15%)

pluses:

much faster run time, good for debug big applications, good for big amount of specs.
similar speed of incremental compile as in default. Most of time we spend in incremental compile.

src/compiler/crystal/command.cr

funny-falcon · 2023-09-16T22:18:24Z

Still no progress? pitty

kostya · 2023-10-31T18:30:18Z

why not merge it? it not change current compilation (default and --release), only add more options for build customization.

funny-falcon · 2023-11-01T23:47:11Z

Yeah, quite strange unwilling to improve user experience.

straight-shoota · 2023-11-03T11:01:20Z

I'm sorry this has been sitting for so long. It's not unwillingness. There's a lot of review work and limited resources. Sometimes PRs fall through the crack. 😢
Thanks for calling for attention on this. This is definitely one of the contributions that should not be neglected.

straight-shoota

Looks great overal! I have some suggestions for small improvements.
And merge conflicts need to be resolved via git merge master.

src/compiler/crystal/command.cr

src/compiler/crystal/compiler.cr

straight-shoota · 2023-11-03T11:24:15Z

src/compiler/crystal/compiler.cr

-      current_bc_flags = "#{@codegen_target}|#{@mcpu}|#{@mattr}|#{@release}|#{@link_flags}|#{@mcmodel}"
-      bc_flags_filename = "#{output_dir}/bc_flags"
+      current_bc_flags = "#{@codegen_target}|#{@mcpu}|#{@mattr}|#{@link_flags}|#{@mcmodel}"
+      bc_flags_filename = "#{output_dir}/bc_flags#{optimization_mode_suffix}"


question: Putting the optimization level in the filename seems like a great idea. It changes from the current behaviour where it's only written in the file contents.
I'm wondering about the effects of this.
I suppose it means the caches for different optimization modes won't override each other.
Does it mean different caches stay around? But what about the actual data files in output_dir?

Before this PR, all objects files was mixed, but release used only 2 files: _main.o and _main.bc, so mixing was not so important. But in this PR added many modes, so every mode have separate compiled objects(.o and .bc) for each class.
About the actual data, still this is only cache files, its became useless, it can be removed just with rm -rf ~/.cache/crystal. And of course all caches for each mode would stay around. But in every day usage this cache would be the similar as before this PR, because default compile - would generate .o0 files, and release would generate just 2 files _main.o.o3 _main.bc.o3

src/compiler/crystal/compiler.cr

straight-shoota · 2023-11-03T11:37:29Z

src/compiler/crystal/compiler.cr

@@ -755,7 +810,7 @@ module Crystal
        end

        if must_compile
-          compiler.optimize llvm_mod if compiler.release?
+          compiler.optimize llvm_mod if compiler.optimization_mode != OptimizationMode::O0


suggestion: Predicate method is type safe and more concise:

Suggested change

compiler.optimize llvm_mod if compiler.optimization_mode != OptimizationMode::O0

compiler.optimize llvm_mod unless compiler.optimization_mode.o0?

straight-shoota · 2023-11-03T11:42:09Z

src/compiler/crystal/macros/macros.cr

@@ -145,7 +145,7 @@ class Crystal::Program

      # Although release takes longer, once the bc is cached in .crystal
      # the subsequent times will make program execution faster.
-      host_compiler.release = true
+      host_compiler.release!


note: According to #13505 (comment) ff. it seems to be more efficient to build macros not in a single module.
It's probably best to leave the current behaviour in place here. We can follow up with a change to macro generation config. Since the host compiler inherits its configuration from the target compiler there's more involved than just switching this to host_compiler.optimization_mode = :03.

Not sure about changing this, because this is for compile run macroses, as i understand. Which should have fast runtime, and this is done only once. So release! here is in place.

@kostya macroses should run "fast-enough". Compiled without "single-module" is certainly fast enough for macroses, since they are not computation heavy. I doubt you could measure difference, I bet case of beer on it. But delta of time consumed by compilation of macroses is certainly measurable.

kostya · 2023-11-04T12:36:27Z

rebased, squashed

straight-shoota

LGTM.

P.S. Next time, please do not force push. Just merge and amend new commits. That makes reviews easier. Thanks. 🙏 (ref: https://github.com/crystal-lang/crystal/blob/master/CONTRIBUTING.md#making-good-pull-requests).

zw963 · 2023-11-05T05:50:06Z

Cool.

…? method

funny-falcon reviewed May 12, 2023

View reviewed changes

src/compiler/crystal/compiler.cr Outdated Show resolved Hide resolved

funny-falcon reviewed May 12, 2023

View reviewed changes

jkthorne reviewed May 13, 2023

View reviewed changes

src/compiler/crystal/command.cr Outdated Show resolved Hide resolved

kostya changed the title ~~Add incremental release compilation~~ Add incremental optimization levels May 13, 2023

HertzDevil added kind:feature topic:compiler:codegen labels May 13, 2023

kostya commented May 13, 2023

View reviewed changes

straight-shoota mentioned this pull request May 26, 2023

More codegen optimization options #13505

Closed

Blacksmoke16 reviewed May 26, 2023

View reviewed changes

src/compiler/crystal/command.cr Outdated Show resolved Hide resolved

straight-shoota reviewed Nov 3, 2023

View reviewed changes

Add optimization modes O0, O1, O2 and O3

f7bf957

kostya force-pushed the opt_level branch from cd06160 to f7bf957 Compare November 4, 2023 12:35

straight-shoota approved these changes Nov 4, 2023

View reviewed changes

HertzDevil approved these changes Nov 4, 2023

View reviewed changes

straight-shoota added this to the 1.11.0 milestone Nov 4, 2023

Add strict number for each option, because it is parsed by from_value…

6e4be4c

…? method

straight-shoota merged commit e838701 into crystal-lang:master Nov 6, 2023
54 of 55 checks passed

Blacksmoke16 pushed a commit to Blacksmoke16/crystal that referenced this pull request Dec 11, 2023

Add incremental optimization levels (crystal-lang#13464)

2e0c5db

straight-shoota added the topic:compiler:cli label Dec 21, 2023

This was referenced Jan 2, 2024

Add optimization levels to manpage #14162

Merged

Add optimization levels to compiler manual crystal-lang/crystal-book#729

Merged

BrewTestBot mentioned this pull request Jan 8, 2024

crystal 1.11.0 Homebrew/homebrew-core#159350

Merged

1 task

zw963 mentioned this pull request Jan 13, 2024

Optimize runtime for non single-module(01,02) compilation #14225

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add incremental optimization levels #13464

Add incremental optimization levels #13464

kostya commented May 12, 2023 •

edited by straight-shoota

Loading

Blacksmoke16 commented May 12, 2023

funny-falcon commented May 12, 2023 •

edited

Loading

Sija commented May 12, 2023

funny-falcon May 12, 2023

funny-falcon May 12, 2023 •

edited

Loading

kostya commented May 13, 2023 •

edited

Loading

kostya May 13, 2023

HertzDevil Nov 4, 2023

zw963 commented May 13, 2023

kostya commented May 14, 2023

funny-falcon commented Sep 16, 2023

kostya commented Oct 31, 2023

funny-falcon commented Nov 1, 2023

straight-shoota commented Nov 3, 2023

straight-shoota left a comment •

edited

Loading

straight-shoota Nov 3, 2023

kostya Nov 3, 2023

straight-shoota Nov 3, 2023

straight-shoota Nov 3, 2023

kostya Nov 7, 2023 •

edited

Loading

funny-falcon Nov 7, 2023 •

edited

Loading

kostya commented Nov 4, 2023

straight-shoota left a comment •

edited

Loading

zw963 commented Nov 5, 2023

	compiler.optimize llvm_mod if compiler.optimization_mode != OptimizationMode::O0
	compiler.optimize llvm_mod unless compiler.optimization_mode.o0?

Add incremental optimization levels #13464

Add incremental optimization levels #13464

Conversation

kostya commented May 12, 2023 • edited by straight-shoota Loading

Blacksmoke16 commented May 12, 2023

funny-falcon commented May 12, 2023 • edited Loading

Sija commented May 12, 2023

funny-falcon May 12, 2023

Choose a reason for hiding this comment

funny-falcon May 12, 2023 • edited Loading

Choose a reason for hiding this comment

kostya commented May 13, 2023 • edited Loading

kostya May 13, 2023

Choose a reason for hiding this comment

HertzDevil Nov 4, 2023

Choose a reason for hiding this comment

zw963 commented May 13, 2023

kostya commented May 14, 2023

funny-falcon commented Sep 16, 2023

kostya commented Oct 31, 2023

funny-falcon commented Nov 1, 2023

straight-shoota commented Nov 3, 2023

straight-shoota left a comment • edited Loading

Choose a reason for hiding this comment

straight-shoota Nov 3, 2023

Choose a reason for hiding this comment

kostya Nov 3, 2023

Choose a reason for hiding this comment

straight-shoota Nov 3, 2023

Choose a reason for hiding this comment

straight-shoota Nov 3, 2023

Choose a reason for hiding this comment

kostya Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

funny-falcon Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

kostya commented Nov 4, 2023

straight-shoota left a comment • edited Loading

Choose a reason for hiding this comment

zw963 commented Nov 5, 2023

kostya commented May 12, 2023 •

edited by straight-shoota

Loading

funny-falcon commented May 12, 2023 •

edited

Loading

funny-falcon May 12, 2023 •

edited

Loading

kostya commented May 13, 2023 •

edited

Loading

straight-shoota left a comment •

edited

Loading

kostya Nov 7, 2023 •

edited

Loading

funny-falcon Nov 7, 2023 •

edited

Loading

straight-shoota left a comment •

edited

Loading