-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More codegen optimization options #13505
Comments
|
O1 vs O2 could have more difference if loop unrolling will be disabled for O1. Disabling loop unrolling certainly improves compilation speed and hurts performance. I'll try to redo measures till Monday to show numbers. I don't think new explicit "Header" annotation worth, but implicit one for small methods and "AlwaysInline" is reasonable thing. Crystal still accumulates whole code for analyze, so looks like it would not hurt compilation time much. I doubt -O3 without --single-module is reasonable. If user doesn't want every bit performance, then -O2 is already good (especially if "header" methods will be implemented). If they wants, then single module compilation in --release/-O3 is just ok. But I could be wrong here. I really want to change default option. It will be almost useless being -O0. Only very simple scripts will benefits from "super fast compilation" compared to -O1, and debug information with -O1 should be enough in most cases (this sentence should be checked yet). |
After check those benchmarks, from my point of view, -O1 is much better than default no optimization. Even a little bit of improvement for incremental is huge! Sure, i missing the context for why select no optimization as default, there may be concerns for make this decision. |
I've tried to disable loop unrolling for O1 ( funny-falcon@94fe0cd ) on top of @kostya's AlwaysInline optimizations, but didn't see significant effect on Kostya's brainfuck benchmark nor on self-crystal compilation. |
But macros compilation with O1 ( funny-falcon@146069d ) gave 4-5 second improvement on Crystal self-compilation with -O1 level (1:15 -> 1:10 real time, 2:52->2:48 CPU time) and -O2 level (1:18 -> 1:14 real time). |
i think it would have effect only in single-module |
@kostya Looks like you're right: just disabling single_module for macros compilation (and keeping O3 optimization level) saves time as well. |
I run some benchmarks:
Before (branch opt_level):binarytrees After optimizations (branch optimizations2):kostya/crystal@opt_level...kostya:crystal:optimizations2 binarytrees Best results:After optimizations in Not all optimizations was done.There is remains optimizations in Conclusion:After optimizations |
It seem like still no-optimization give a better result than -O1/-O2 when increment? |
@zw963 compile time yes, but this was expected, look at run time. |
It depends on definition of "better". |
My take on this, mostly what's been said already:
I consider the idea of a pure Crystal inlining interesting, but orthogonal. I'd rather put that in another issue. |
This was resolved in #13464 |
Hi, @kostya , i see this issue closed, so, what is the difference between |
this optimizations kostya/crystal@opt_level...kostya:crystal:optimizations2 was not merged, its quite questionable, but it optimize incremental release compilation runtime by huge. |
Yes, questionable but we can discuss, whether create a new issue for this? |
this what I gain with this optimizations: |
That is promising. Can you please create another pull request contains only those parts? I would like to add a compiler switch that when turned on will apply all the patches. The official compiler might do not need it but it is surely useful for my daily coding session. |
open it here: #14225 |
Code optimization in the Crystal compiler is provided by LLVM and currently has two optimization modes: By default there are no code optimizations,
--release
builds a single LLVM module and applies the highest available optimizations (-O3
, "agressive").This heavy optimization results long build times (up to 10x longer than non-release). Besides the optimizations themselves being more extensive, the single module prevents parallel processing and re-use for repeated builds.
--release
mode is great for release builds with the best possible runtime performance. But there are other use cases for decently performing builds with a more restrained compile time.This has previously been discussed in the forum: https://forum.crystal-lang.org/t/faster-release-compile-times-but-slightly-worse-performance/3864
#13464 contains a PoC implementation and benchmarks for it. I'm very happy about the numbers because they allow us to make informed decisions about the expected performance profiles.
Thanks @kostya for driving this forward!
I think it's better to move this discussion to a dedicated issue for clarity, independent of the specific PR implementation.
I'm noting my observations (some may have already been mentioned before):
As a first step, I would not change anything about the existing behaviours (what
--release
and the default mean). Let's first add just additional options for intermediate optimization levels into the compiler. That can give us a better understanding of how they play out on Crystal projects.We can later introduce additional changes to the meaning of
--release
and the default behaviour, or potentially introduce additional CLI options as well.In both examples,
--level1
and--level2
appear almost identical. Any idea why that is? Surely there must be a significant difference between the optimization levels in LLVM. Maybe it doesn't have much effect in the specific types of programs? Or for Crystal programs in general? It could be worth investigating into that because it's very odd that these two levels appear almost redundant to each other.Other LLVM frontends (clang, rustc) use -O0 -O1 -O2 -O3 for opt level indication. I think it's reasonable to follow with the naming of CLI options. Even if that can mean a different effective impact than the same levels in other compilers.
A distinction between aggressive optimization (O3) and single-module would be nice. It should be possible to use -O3 without single module.
-O3
should not imply--single-module
, but--release
would expand to-O3 --single-module
.This would give another option between
--level2
and--release
in the benchmarks which might be a very reasonable choice to get performant builds repetitively.This is meant to start a discussion, not express final opinions. Please let's discuss these (and other aspects) first before making changes to the PR.
The text was updated successfully, but these errors were encountered: