-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCons: Make lto=auto
prefer ThinLTO over full LTO for LLVM targets
#96785
SCons: Make lto=auto
prefer ThinLTO over full LTO for LLVM targets
#96785
Conversation
Did some tests for the Web export templates, for now evaluating only build time and build size. Initial findings are that enabling LTO (both I haven't evaluated performance for now, but all my builds are available here if someone wants to benchmark them (not sure how to do this easily on the web). They're built from this PR, which has the same base commit as https://downloads.tuxfamily.org/godotengine/testing/4.4-dev2-lto-comparison-web.zip Web
template_release
|
Tested for Android, and similarly found that LLVM LTO (both thin and full) seem to significantly increase the binary size. Likewise, I haven't judged the performance impact on the other hand. I'm also uploading my builds if someone wants to do some benchmarking (apk and zip only to minimize the total download size). https://downloads.tuxfamily.org/godotengine/testing/4.4-dev2-lto-comparison-android.zip Androidtemplate_release
|
I thought that using |
Did some more compilation tests for Linux, with both GCC 14.2.1 and LLVM 18.1.6. It seems like our common argument that LTO improves not only performance but also binary size holds true for GCC builds (-9% on release template with GCC full LTO), but is totally wrong for LLVM (+14.5% for ThinLTO and +11.5% for full LTO with LLVM, with atrocious build times for the latter). Again haven't checked performance numbers. https://downloads.tuxfamily.org/godotengine/testing/4.4-dev2-lto-comparison-linux.zip Linux (GCC)
template_release
|
After searching random things online it's possible that the extra size is inlining. From what I'm reading the whole point of LTO is inlining between compilation units, outside of more accurate dead code elimination and stuff like that, so extra size is to be expected. It looks there are some extra settings that might be useful, pointed out by this link: https://discourse.llvm.org/t/clang-lld-thin-lto-footprint-and-run-time-performance-outperformed-by-gcc-ld/78997 I haven't read it fully but it seems very relevant to this PR. Probably the easiest change would be to pass
It looks like LLVM is way more focused on performance, which might explain why even -Os seems to inline more or anyways not optimize for size as much as GCC. The link above mentions a lot of other things we could try. It also mentions an interesting feature called "remarks", which apparently makes the compiler tell us why it hasn't optimized something: https://llvm.org/docs/Remarks.html I have no idea how it works but it might be insightful if there's an easy way to parse the resulting file (the wiki mentions some like @akien-mga if it isn't too annoying could you also pass a |
bbe5449
to
d4655b1
Compare
lto=auto
enable/prefer ThinLTO for LLVM targetslto=auto
prefer ThinLTO over full LTO for LLVM targets
Based on findings so far, I updated this PR to only change the platforms which currently used For our official builds, this means it only affects:
For evaluation the actual gains of various LTO configurations, and see if we should start using LTO for Android/macOS/Windows clang-cl (and maybe iOS but this caused slow linking in Xcode, could maybe be re-assessed with ThinLTO), I'll open a new issue where I'll share my metrics again (and @Riteo can add the research they wrote here). |
d4655b1
to
c814952
Compare
I have not tested lto on current master macOS/iOS (will do in a few hours), but last time I did, patter was the same as other clang builds: size increase for both thin and full lto, huge time difference (and memory usage for full lto). |
I opened #96851 to continue the in-depth review of the different configuration options for each target. This PR in the meantime just switches LLVM full LTO to thin LTO for the targets that currently use LTO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense to me. The time and memory usage required for full LTO can be prohibitive for casual builders who just use production=yes
to create their own binaries, so ThinLTO is a better default.
This speeds up build time considerably for these platforms compared to using `lto=full`, which is sadly single-threaded with LLVM, unlike GCC. Changes to default behavior of `lto=auto` (i.e. `production=yes`): - Linux: Prefer ThinLTO for LLVM - Web: Prefer ThinLTO - Windows: Prefer ThinLTO for llvm-mingw The following LLVM targets don't use LTO by default currently, which needs to be assessed further (gains from LLVM LTO on performance need to be weighed against the potential size increase from heavy inlining): - Android - iOS - macOS - Windows clang-cl
c814952
to
26db0bb
Compare
Edit: Changed the scope of this PR to only impact targets for which we already used LLVM's full LTO, and change those to ThinLTO, to speed up builds significantly.
This speeds up build time considerably for these platforms compared to
using
lto=full
, which is sadly single-threaded with LLVM, unlike GCC.Changes to default behavior of
lto=auto
(i.e.production=yes
):The following LLVM targets don't use LTO by default currently, which
needs to be assessed further (gains from LLVM LTO on performance need
to be weighed against the potential size increase from heavy inlining):
Needs heavy testing and comparison of builds with and without LTO (
thin
/full
) for the affected platforms.We should benchmark and documents once and for all the impact of LTO on build time, build size, and performance for each platform, so we can default to the optimal configuration out of the box.