forked from JuliaLang/julia
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This uses LLVM's BOLT to optimize libLLVM, libjulia-internal and libjulia-codegen. This improves the allinference benchmarks by about 10% largely due to the optimization of libjulia-internal. The example in issue JuliaLang#45395 which stresses LLVM significantly more also sees a ~10% improvement. We see a 20% improvement on ```julia @time for i in 1:100000000 string(i) end ``` When building corecompiler.ji: BOLT gives about a 16% improvement PGO+LTO gives about a 21% improvement PGO+LTO+BOLT gives about a 23% improvement This only requires a single build of LLVM and theoretically none if we change the binary builder script (i.e. we build with relocations and the `-fno-reorder-blocks-and-partition` and then we can use BOLT to get binaries with no relocations and reordered blocks and then ship both binaries?) compared to the 2 in PGO. Also, this theoretically can improve performance of a PGO+LTO build by a couple %. The only reproducible test problem I see is that the BOLT, PGO+LTO and PGO+LTO+BOLT builds all cause `readelf` to emit warnings as part of the `osutils` tests. ``` readelf: Warning: Unrecognised form: 0x22 readelf: Warning: DIE has locviews without loclist readelf: Warning: Unrecognised form: 0x23 readelf: Warning: DIE at offset 0x227399 refers to abbreviation number 14754 which does not exist readelf: Warning: Bogus end-of-siblings marker detected at offset 212aa9 in .debug_info section readelf: Warning: Bogus end-of-siblings marker detected at offset 212ab0 in .debug_info section readelf: Warning: Further warnings about bogus end-of-sibling markers suppressed ``` The unrecognised form warnings seem to be a bug in binutils, https://sourceware.org/bugzilla/show_bug.cgi?id=28981. `DIE at offset` warning I believe was fixed in binutils 2.36, https://sourceware.org/bugzilla/show_bug.cgi?id=26808, but `ld -v` says I have 2.38. I assume these are all benign. I also don't see them on CI here https://buildkite.com/julialang/julia-buildkite/builds/1507#018f00e7-0737-4a42-bcd9-d4061dc8c93e so could just be a local issue.
- Loading branch information
Showing
14 changed files
with
525 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
profiles-bolt* | ||
optimized.build | ||
toolchain | ||
|
||
bolt | ||
bolt_instrument | ||
merge_data | ||
copy_originals | ||
stage0 | ||
stage1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
.PHONY: clean clean_profiles restore_originals | ||
|
||
# Settings taken from https://github.com/rust-lang/rust/blob/master/src/tools/opt-dist/src/bolt.rs | ||
BOLT_ARGS := | ||
# Reorder basic blocks within functions | ||
BOLT_ARGS += -reorder-blocks=ext-tsp | ||
# Reorder functions within the binary | ||
BOLT_ARGS += -reorder-functions=cdsort | ||
# Split function code into hot and code regions | ||
BOLT_ARGS += -split-functions | ||
# Split as many basic blocks as possible | ||
BOLT_ARGS += -split-all-cold | ||
# Move jump tables to a separate section | ||
BOLT_ARGS += -jump-tables=move | ||
# Use regular size pages for code alignment | ||
BOLT_ARGS += -no-huge-pages | ||
# Fold functions with identical code | ||
BOLT_ARGS += -icf=1 | ||
# Split using best available strategy (three-way splitting, Cache-Directed Sort) | ||
# Disabled for libjulia-internal till https://github.com/llvm/llvm-project/issues/89508 is fixed | ||
# BOLT_ARGS += -split-strategy=cdsplit | ||
# Update DWARF debug info in the final binary | ||
BOLT_ARGS += -update-debug-sections | ||
# Print optimization statistics | ||
BOLT_ARGS += -dyno-stats | ||
# BOLT doesn't fully support computed gotos, https://github.com/llvm/llvm-project/issues/89117 | ||
# Use escaped regex as the name BOLT recognises is often a bit different, e.g. apply_cl/1(*2) | ||
# This doesn't actually seem to do anything, the actual mitigation is not using --use-old-text | ||
# which we do in the bolt target | ||
BOLT_ARGS += -skip-funcs=.\*apply_cl.\* | ||
|
||
# -fno-reorder-blocks-and-partition is needed on gcc >= 8. | ||
BOLT_FLAGS := $\ | ||
"BOLT_CFLAGS_GCC+=-fno-reorder-blocks-and-partition" $\ | ||
"BOLT_LDFLAGS=-Wl,--emit-relocs" | ||
|
||
STAGE0_BUILD:=$(CURDIR)/toolchain | ||
STAGE1_BUILD:=$(CURDIR)/optimized.build | ||
|
||
STAGE0_BINARIES:=$(STAGE0_BUILD)/usr/bin/ | ||
|
||
PROFILE_DIR:=$(CURDIR)/profiles-bolt | ||
JULIA_ROOT:=$(CURDIR)/../.. | ||
|
||
LLVM_BOLT:=$(STAGE0_BINARIES)llvm-bolt | ||
LLVM_MERGEFDATA:=$(STAGE0_BINARIES)merge-fdata | ||
|
||
# If you add new files to optimize, you need to add BOLT_LDFLAGS and BOLT_CFLAGS to the build of your new file. | ||
SYMLINKS_TO_OPTIMIZE := libLLVM.so libjulia-internal.so libjulia-codegen.so | ||
FILES_TO_OPTIMIZE := $(shell for file in $(SYMLINKS_TO_OPTIMIZE); do readlink $(STAGE1_BUILD)/usr/lib/$$file; done) | ||
|
||
AFTER_INSTRUMENT_MESSAGE:='Run `make finish_stage1` to finish off the build. $\ | ||
You can now optionally collect more profiling data by running Julia with an appropriate workload, $\ | ||
if you wish, run `make clean_profiles` before doing so to remove any profiling data generated by `make finish_stage1`. $\ | ||
You should end up with some data in $(PROFILE_DIR). Afterwards run `make merge_data && make bolt`. $\ | ||
|
||
$(STAGE0_BUILD) $(STAGE1_BUILD): | ||
$(MAKE) -C $(JULIA_ROOT) O=$@ configure | ||
|
||
stage0: | $(STAGE0_BUILD) | ||
$(MAKE) -C $(STAGE0_BUILD)/deps install-BOLT && \ | ||
touch $@ | ||
|
||
# Build with our custom flags, binary builder doesn't use them so we need to build LLVM for now. | ||
# We manually skip package image creation so that we can profile it | ||
$(STAGE1_BUILD): stage0 | ||
stage1: export USE_BINARYBUILDER_LLVM=0 | ||
stage1: | $(STAGE1_BUILD) | ||
$(MAKE) -C $(STAGE1_BUILD) $(BOLT_FLAGS) julia-src-release julia-symlink julia-libccalltest \ | ||
julia-libccalllazyfoo julia-libccalllazybar julia-libllvmcalltest && \ | ||
touch $@ | ||
|
||
copy_originals: stage1 | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \ | ||
cp $$abs_file "$$abs_file.original"; \ | ||
done && \ | ||
touch $@ | ||
|
||
# I don't think there's any particular reason to have -no-huge-pages here, perhaps slightly more accurate profile data | ||
# as the final build uses -no-huge-pages | ||
bolt_instrument: copy_originals | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \ | ||
$(LLVM_BOLT) "$$abs_file.original" -o $$abs_file --instrument --instrumentation-file-append-pid --instrumentation-file="$(PROFILE_DIR)/$$file-prof" -no-huge-pages; \ | ||
mkdir -p $$(dirname "$(PROFILE_DIR)/$$file-prof"); \ | ||
printf "\n"; \ | ||
done && \ | ||
touch $@ | ||
@echo $(AFTER_INSTRUMENT_MESSAGE) | ||
|
||
# We don't want to rebuild julia-src as then we lose the bolt instrumentation | ||
# So we have to manually build the sysimage and package image | ||
finish_stage1: stage1 | ||
$(MAKE) -C $(STAGE1_BUILD) julia-base-cache && \ | ||
$(MAKE) -C $(STAGE1_BUILD) -f sysimage.mk sysimg-release && \ | ||
$(MAKE) -C $(STAGE1_BUILD) -f pkgimage.mk release | ||
|
||
merge_data: bolt_instrument | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
profiles=$(PROFILE_DIR)/$$file-prof.*.fdata; \ | ||
$(LLVM_MERGEFDATA) $$profiles > "$(PROFILE_DIR)/$$file-prof.merged.fdata"; \ | ||
done && \ | ||
touch $@ | ||
|
||
# The --use-old-text saves about 16 MiB of libLLVM.so size. | ||
# However, the rust folk found it succeeds very non-deterministically for them. | ||
# It tries to reuse old text segments to reduce binary size | ||
# BOLT doesn't fully support computed gotos https://github.com/llvm/llvm-project/issues/89117, so we cannot use --use-old-text on libjulia-internal | ||
# That flag saves less than 1 MiB for libjulia-internal so oh well. | ||
bolt: merge_data | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \ | ||
$(LLVM_BOLT) "$$abs_file.original" -data "$(PROFILE_DIR)/$$file-prof.merged.fdata" -o $$abs_file $(BOLT_ARGS) $$(if [ "$$file" != $(shell readlink $(STAGE1_BUILD)/usr/lib/libjulia-internal.so) ]; then echo "--use-old-text -split-strategy=cdsplit"; fi); \ | ||
done && \ | ||
touch $@ | ||
|
||
clean_profiles: | ||
rm -rf $(PROFILE_DIR) | ||
|
||
clean: | ||
rm -f stage0 stage1 bolt copy_originals merge_data bolt_instrument | ||
|
||
restore_originals: copy_originals | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \ | ||
cp -P "$$abs_file.original" $$abs_file; \ | ||
done | ||
|
||
delete_originals: copy_originals | ||
for file in $(FILES_TO_OPTIMIZE); do \ | ||
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \ | ||
rm "$$abs_file.original"; \ | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
BOLT only works on x86_64 and arch64 on Linux. | ||
|
||
DO NOT STRIP THE RESULTING .so FILES, https://github.com/llvm/llvm-project/issues/56738. | ||
If you really need to, try adding `-use-gnu-stack` to `BOLT_ARGS`. | ||
|
||
To build a BOLT-optimized version of Julia run the following commands (`cd` into this directory first) | ||
```bash | ||
make stage1 | ||
make copy_originals | ||
make bolt_instrument | ||
make finish_stage1 | ||
make merge_data | ||
make bolt | ||
``` | ||
After these commands finish, the optimized version of Julia will be built in the `optimized.build` directory. | ||
|
||
This doesn't align the code to support huge pages as it doesn't seem that we do that currently, this decreases the size of the .so files by 2-4mb. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
stage0* | ||
stage1* | ||
stage2* | ||
bolt | ||
bolt_instrument | ||
merge_data | ||
copy_originals | ||
|
||
profiles | ||
profiles-bolt | ||
|
||
toolchain | ||
pgo-instrumented.build | ||
optimized.build |
Oops, something went wrong.