Use thinlto and pgo for x86_64 windows release packaging #71067

zmodem · 2023-11-02T15:09:31Z

Applying this to 17.0.4 makes the installer 5% smaller (298 MB vs. 315 MB) and the toolchain 20% faster (as measured by building clang).

zmodem · 2023-11-02T15:11:20Z

+cc @pbo-linaro

This has been on the todo list a long time.

Why not x86 also? We can't do bootstrapping thinlto because the linker will run out of memory. I think it's okay to focus on x86_64 for performance. Maybe we can even stop building the 32-bit binaries eventually.

ms178 · 2023-11-02T18:35:27Z

Why stop with LTO+PGO? BOLTing Clang/LLD should yield another sizeable improvement.

Added my recent BOLT stats for Clang/LLD-18 (on Linux):
LLVM-18-BOLT-231026_stats.txt

pbo-linaro · 2023-11-03T08:20:58Z

That would be a nice addition @zmodem.
I know it's a boring request, but could you evaluate separately benefits from PGO alone, and then adding ThinLTO?
Same applies if we add BOLT too.
Build times should be checked too ideally.

Regarding the external source file used for profiling (pgo_training-1.ii), I don't know LLVM policy, but maybe it would be better to host it on our side, so we're sure it does not disappear one of those days
By the way, what is the difference with pgo_training-2.ii, which is available too? If you have links to explain those files, it would be nice to add as a comment.

Finally, aarch64 build could benefit from that too, but that can be done later.

zmodem · 2023-11-06T12:12:10Z

Why stop with LTO+PGO? BOLTing Clang/LLD should yield another sizeable improvement.

My understanding was the BOLT for PE/COFF binaries isn't production ready.

Regarding the external source file used for profiling (pgo_training-1.ii), I don't know LLVM policy, but maybe it would be better to host it on our side, so we're sure it does not disappear one of those days. By the way, what is the difference with pgo_training-2.ii, which is available too? If you have links to explain those files, it would be nice to add as a comment.

Thinking about this some more, using pgo_training-1.ii is convenient for Chromium, but may not be right for LLVM. As you say, we should probably store it somewhere in the project, and at 11 MB we can't just put it in Git. Additionally, it may need updating now and then.

I've pushed a new commit to train by building Clang's Sema.cpp instead. It requires running CMake one more time, but I think that's okay. The performance seems to be the same.

(To answer your questions anyway, the origin of pgo_training-1.ii is documented here: https://source.chromium.org/chromium/chromium/src/+/main:tools/clang/scripts/build.py;l=1044 The story of the -2 version is here: https://bugs.chromium.org/p/chromium/issues/detail?id=984067)

I know it's a boring request, but could you evaluate separately benefits from PGO alone, and then adding ThinLTO? [..] Build times should be checked too ideally.

Build times on my workstation:

Baseline (using the checked in version of build_llvm_package.py)
1h 30 minutes
PGO only
1h 40 minutes (+11%)
PGO+ThinLTO
3h 44 minutes (+149%)

Performance, using the clang produced above to do a release build of clang (metrics are best-of-two):

Baseline
4m 55s
PGO only
3m 49s (-22%)
PGO + ThinLTO
3m 38s (-26%)

I was surprised that PGO helped so much by itself. Given that ThinLTO builds are so much slower to produce, I'm tempted to say we should just go with PGO for now. What do you think?

Finally, aarch64 build could benefit from that too, but that can be done later.

Yes, and hopefully we can refactor some of the code to be shared at that point.

ms178 · 2023-11-06T12:26:21Z

Why stop with LTO+PGO? BOLTing Clang/LLD should yield another sizeable improvement.

My understanding was the BOLT for PE/COFF binaries isn't production ready.

@maksfb or @aaupov might have some insights to share about PE/COEFF binary support of BOLT or might be interested in a BOLT-optimized Windows release package.

pbo-linaro · 2023-11-06T12:45:44Z

Thanks for taking the time to check the different configurations.

Using LLVM source code for profiling data is a good solution IMHO.
For any reasonable complicated project using "most" of C++ features, it should bring similar data.

The price for ThinLTO is surprisingly high (in memory usage too I guess), and compared to PGO alone, the added value is pretty low in this case. Considering the numbers, I agree with you to only do PGO, as it's almost "free".

If we go with the PGO only version, it could be nice to directly factorize this for all architecture (do_build_libxml function provides an example).

aaupov · 2023-11-07T04:28:03Z

There's no BOLT support for PE/COFF, nor any plans on our end.

zmodem · 2023-11-07T12:44:08Z

The price for ThinLTO is surprisingly high (in memory usage too I guess), and compared to PGO alone, the added value is pretty low in this case. Considering the numbers, I agree with you to only do PGO, as it's almost "free".

Part of the high cost is that it applies not just to clang, lld, etc., but all the binaries, including the ones only used for testing.

If we go with the PGO only version, it could be nice to directly factorize this for all architecture (do_build_libxml function provides an example).

Agreed, but can we do it in a follow-up when applying this to more targets? (For example, the PGO part should work for x86 too.)

pbo-linaro · 2023-11-07T13:12:20Z

Oh yes, I missed that. So memory usage might be more a problem when linking (with thinLTO) all binaries. Better to not enable this for now.

No problem to do it in another PR. Indeed, all the architectures (x86, x64, arm64) could benefit from PGO.

mstorsjo · 2023-11-07T13:25:58Z

A quick side question regarding PGO (while listening in on this topic); how portable is the profile information across builds? If I e.g. build Clang on Linux, then do some profiling with that, I can use that to do a new Clang build for Linux with PGO. But can I use the same profile information for PGO when I, still on Linux, cross compile Clang to run on Windows?

zmodem · 2023-11-14T17:38:53Z

Apologies for the late reply; I got distracted by other things.

how portable is the profile information across builds?

It can work across targets, but e.g. if the function names from the profile and the current build don't match because of mangling, those functions will not benefit from the profile info.

zmodem · 2023-11-14T17:42:10Z

I extracted the profile generation into a separate "function". It doesn't work on 32-bit x86 because the linker runs out of memory, but it would be interesting to hear whether it works for arm64!

mstorsjo · 2023-11-15T10:43:12Z

how portable is the profile information across builds?

It can work across targets, but e.g. if the function names from the profile and the current build don't match because of mangling, those functions will not benefit from the profile info.

Ah, thanks! That makes sense.

So between Linux and MinGW x86_64, arm and aarch64, it should mostly match. (I guess that the exception is for cases where types are defined differently, and those types are visible in function signatures.) Cases where functions take C++ standard library types probably won't match. Linux and MSVC builds probably wouldn't match at all as the C++ symbol name mangling is different.

One interesting case is where the mangling mostly matches but one case has an extra symbol prefix while the other one doesn't - like Windows/i386, and macOS on all architectures. It would be neat if it would be possible to apply a transformation on the profile information, to add/remove such a prefix, to make it match the other target better. CC @ZequanWu - how feasible would this be?

zmodem · 2023-11-15T13:11:17Z

I used the new version of the script to package the 17.0.5 binaries, so if someone wants to try the PGO-built toolchain, it's here: https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.5/LLVM-17.0.5-win64.exe

ZequanWu · 2023-11-15T16:59:57Z

One interesting case is where the mangling mostly matches but one case has an extra symbol prefix while the other one doesn't - like Windows/i386, and macOS on all architectures. It would be neat if it would be possible to apply a transformation on the profile information, to add/remove such a prefix, to make it match the other target better. CC @ZequanWu - how feasible would this be?

IIUC, you want to remove such mangled name prefix when using profile from x64 but targeting to windows i386. I'm not sure about this. @xur-llvm might know this. BTW, what's the extra symbol prefix? Is it prefixed to all symbol on Windows i386?

mstorsjo · 2023-11-15T20:22:15Z

IIUC, you want to remove such mangled name prefix when using profile from x64 but targeting to windows i386.

My primary usecase would be to profile on x86_64 Linux, and apply this profile when I cross-build binaries for Windows i386. (The symbol names should mostly match without other tricks when cross building binaries for Windows for other architectures.)

The same mechanism would help if profiling on x86_64 Windows and using the profile for a build for Windows i386 as well, or profiling on Linux but using the profile on macOS, or vice versa as well.

I'm not sure about this. @xur-llvm might know this. BTW, what's the extra symbol prefix? Is it prefixed to all symbol on Windows i386?

Almost, yes. On macOS, essentially all extern symbols have an extra underscore at the start. On Windows i386, all cdecl symbols have an extra underscore at the start as well. MSVC C++ ABI symbols don't have such an extra underscore on i386 though. But the Itanium ABI/name mangling used in MinGW should be essentially the same as on other Itanium platforms, modulo an extra leading underscore. And the same goes for Linux<->macOS.

Applying this to 17.0.4 makes the toolchain 22% faster (as measured by building clang).

Use thinlto and pgo for x86_64 windows release packaging

9ae604d

zmodem requested a review from CarlosAlbertoEnciso November 2, 2023 15:09

Train by building Sema.cpp instead.

dd1d7cc

Drop ThinLTO

b84b933

Extract do_generate_profile

090bfda

zmodem merged commit 060af26 into llvm:main Nov 14, 2023

zmodem deleted the thinlto_packaging branch November 15, 2023 13:11

zahiraam pushed a commit to zahiraam/llvm-project that referenced this pull request Nov 20, 2023

Use PGO for x86_64 windows release packaging (llvm#71067)

c410e21

Applying this to 17.0.4 makes the toolchain 22% faster (as measured by building clang).

zamazan4ik mentioned this pull request Jan 11, 2024

Evaluate using Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) shady-gang/shady#18

Closed

tstellar mentioned this pull request Jan 12, 2024

[CMake][PGO] Build Sema.cpp to generate profdata for PGO builds #77347

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use thinlto and pgo for x86_64 windows release packaging #71067

Use thinlto and pgo for x86_64 windows release packaging #71067

zmodem commented Nov 2, 2023

zmodem commented Nov 2, 2023

ms178 commented Nov 2, 2023 •

edited

Loading

pbo-linaro commented Nov 3, 2023 •

edited

Loading

zmodem commented Nov 6, 2023

ms178 commented Nov 6, 2023

pbo-linaro commented Nov 6, 2023

aaupov commented Nov 7, 2023

zmodem commented Nov 7, 2023

pbo-linaro commented Nov 7, 2023

mstorsjo commented Nov 7, 2023

zmodem commented Nov 14, 2023

zmodem commented Nov 14, 2023

mstorsjo commented Nov 15, 2023

zmodem commented Nov 15, 2023

ZequanWu commented Nov 15, 2023

mstorsjo commented Nov 15, 2023

Use thinlto and pgo for x86_64 windows release packaging #71067

Use thinlto and pgo for x86_64 windows release packaging #71067

Conversation

zmodem commented Nov 2, 2023

zmodem commented Nov 2, 2023

ms178 commented Nov 2, 2023 • edited Loading

pbo-linaro commented Nov 3, 2023 • edited Loading

zmodem commented Nov 6, 2023

ms178 commented Nov 6, 2023

pbo-linaro commented Nov 6, 2023

aaupov commented Nov 7, 2023

zmodem commented Nov 7, 2023

pbo-linaro commented Nov 7, 2023

mstorsjo commented Nov 7, 2023

zmodem commented Nov 14, 2023

zmodem commented Nov 14, 2023

mstorsjo commented Nov 15, 2023

zmodem commented Nov 15, 2023

ZequanWu commented Nov 15, 2023

mstorsjo commented Nov 15, 2023

ms178 commented Nov 2, 2023 •

edited

Loading

pbo-linaro commented Nov 3, 2023 •

edited

Loading