Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BOLT Makefile #54107

Merged
merged 44 commits into from
Jul 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
c4ce68f
Initial Bolt
Zentrik Apr 2, 2024
1ccb1c2
Got BOLT working on all specified .so's some of the time
Zentrik Apr 3, 2024
aea1b1d
Don't optimize sys.so
Zentrik Apr 3, 2024
844b590
Experiment with bolt a bit
Zentrik Apr 3, 2024
5d71322
Hacky way to only use bolt specific flags with .so we will instrument
Zentrik Apr 3, 2024
d478312
Fixup previous commit
Zentrik Apr 3, 2024
a9f524f
Only BOLT libLLVM
Zentrik Apr 4, 2024
3b2ff79
Revert "Only BOLT libLLVM"
Zentrik Apr 4, 2024
a2a7585
Remove rebuild pkgimage step
Zentrik Apr 4, 2024
3e34a36
Clean up
Zentrik Apr 4, 2024
6acf7b9
Update BOLT options for BOLT 18
Zentrik Apr 16, 2024
ce15112
Fix remove originals
Zentrik Apr 16, 2024
9d3d7a3
add BOLT dependency
Zentrik Apr 16, 2024
6c8f549
Only install bolt related binaries and libraries
Zentrik Apr 16, 2024
182c992
Remove thread-count options
Zentrik Apr 16, 2024
e20349a
Add trailing new lines
Zentrik Apr 16, 2024
db0682e
Fix segfault when using BOLT to optimize libjulia-internal
Zentrik Apr 17, 2024
6ecdc98
Fix typos
Zentrik Apr 17, 2024
91ae73d
Fix flags
Zentrik Apr 17, 2024
a454179
Remove install target
Zentrik Apr 17, 2024
137bf15
Bump BOLT to 18.1.4
Zentrik Apr 17, 2024
89e0d9e
Add TODO
Zentrik Apr 17, 2024
1ee10c0
Remove --use-old-text to prevent segfault
Zentrik Apr 17, 2024
e433792
Workaround segfault when using `--use-old-text`
Zentrik Apr 19, 2024
fc96bfc
Fix jll link and capitalise BOLT
Zentrik Apr 19, 2024
3d1cafe
Remove outdated comment
Zentrik Apr 19, 2024
24ba6e8
Remove resolved TODO
Zentrik Apr 19, 2024
6eb0209
Manually rebuild pkgimage, fixup profiling message and optimize libju…
Zentrik Apr 19, 2024
feee71b
Fix message
Zentrik Apr 19, 2024
7c18d0f
Add trailing new line
Zentrik Apr 19, 2024
e259150
Only use `-fno-reorder-blocks-and-partition` for binaries that will g…
Zentrik Apr 19, 2024
984c714
Fix previous commit potentionally
Zentrik Apr 19, 2024
08fd8f1
Fix nit
Zentrik Apr 19, 2024
6d6a83b
Clean up documentation a bit
Zentrik Apr 20, 2024
396e2e0
Add PGO+LTO+BOLT Makefile
Zentrik Apr 20, 2024
d8f8694
Profile sysimg build as well
Zentrik Apr 20, 2024
d33de97
Fix premature terminator warning
Zentrik Apr 20, 2024
9018311
Fix whitespace
Zentrik Apr 20, 2024
c8fbf3b
Remove reference to LoopVectorization
Zentrik Apr 21, 2024
37a64ea
Remove claim of macos support and clarify Readme
Zentrik Apr 23, 2024
4685de3
Remove lines that should have been deleted from Readme
Zentrik Apr 23, 2024
cc176b6
Delete checksum
Zentrik Jul 25, 2024
dc59798
Fix typo
Zentrik Jul 25, 2024
a94e146
Delete checksum
Zentrik Jul 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions Make.inc
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,11 @@ SHIPFLAGS_COMMON := -O3
SHIPFLAGS_CLANG := $(SHIPFLAGS_COMMON) -g
SHIPFLAGS_GCC := $(SHIPFLAGS_COMMON) -ggdb2 -falign-functions

BOLT_LDFLAGS :=

BOLT_CFLAGS_GCC :=
BOLT_CFLAGS_CLANG :=

ifeq ($(OS), Darwin)
JCPPFLAGS_CLANG += -D_LARGEFILE_SOURCE -D_DARWIN_USE_64_BIT_INODE=1
endif
Expand All @@ -532,7 +537,8 @@ JCFLAGS := $(JCFLAGS_GCC)
JCPPFLAGS := $(JCPPFLAGS_GCC)
JCXXFLAGS := $(JCXXFLAGS_GCC)
DEBUGFLAGS := $(DEBUGFLAGS_GCC)
SHIPFLAGS := $(SHIPFLAGS_GCC)
SHIPFLAGS := $(SHIPFLAGS_GCC) $(BOLT_CFLAGS_GCC)
BOLT_CFLAGS := $(BOLT_CFLAGS_GCC)
endif

ifeq ($(USECLANG),1)
Expand All @@ -542,7 +548,8 @@ JCFLAGS := $(JCFLAGS_CLANG)
JCPPFLAGS := $(JCPPFLAGS_CLANG)
JCXXFLAGS := $(JCXXFLAGS_CLANG)
DEBUGFLAGS := $(DEBUGFLAGS_CLANG)
SHIPFLAGS := $(SHIPFLAGS_CLANG)
SHIPFLAGS := $(SHIPFLAGS_CLANG) $(BOLT_CFLAGS_CLANG)
BOLT_CFLAGS := $(BOLT_CFLAGS_CLANG)

ifeq ($(OS), Darwin)
CC += -mmacosx-version-min=$(MACOSX_VERSION_MIN)
Expand Down Expand Up @@ -1295,7 +1302,7 @@ CSL_NEXT_GLIBCXX_VERSION=GLIBCXX_3\.4\.33|GLIBCXX_3\.5\.|GLIBCXX_4\.
# Note: we explicitly _do not_ define `CSL` here, since it requires some more
# advanced techniques to decide whether it should be installed from a BB source
# or not. See `deps/csl.mk` for more detail.
BB_PROJECTS := BLASTRAMPOLINE OPENBLAS LLVM LIBSUITESPARSE OPENLIBM GMP MBEDTLS LIBSSH2 NGHTTP2 MPFR CURL LIBGIT2 PCRE LIBUV LIBUNWIND DSFMT OBJCONV ZLIB P7ZIP LLD LIBTRACYCLIENT
BB_PROJECTS := BLASTRAMPOLINE OPENBLAS LLVM LIBSUITESPARSE OPENLIBM GMP MBEDTLS LIBSSH2 NGHTTP2 MPFR CURL LIBGIT2 PCRE LIBUV LIBUNWIND DSFMT OBJCONV ZLIB P7ZIP LLD LIBTRACYCLIENT BOLT
define SET_BB_DEFAULT
# First, check to see if BB is disabled on a global setting
ifeq ($$(USE_BINARYBUILDER),0)
Expand Down
10 changes: 10 additions & 0 deletions contrib/bolt/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
profiles-bolt*
optimized.build
toolchain

bolt
bolt_instrument
merge_data
copy_originals
stage0
stage1
134 changes: 134 additions & 0 deletions contrib/bolt/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
.PHONY: clean clean_profiles restore_originals

# Settings taken from https://github.com/rust-lang/rust/blob/master/src/tools/opt-dist/src/bolt.rs
BOLT_ARGS :=
# Reorder basic blocks within functions
BOLT_ARGS += -reorder-blocks=ext-tsp
# Reorder functions within the binary
BOLT_ARGS += -reorder-functions=cdsort
# Split function code into hot and code regions
BOLT_ARGS += -split-functions
# Split as many basic blocks as possible
BOLT_ARGS += -split-all-cold
# Move jump tables to a separate section
BOLT_ARGS += -jump-tables=move
# Use regular size pages for code alignment
BOLT_ARGS += -no-huge-pages
# Fold functions with identical code
BOLT_ARGS += -icf=1
# Split using best available strategy (three-way splitting, Cache-Directed Sort)
# Disabled for libjulia-internal till https://github.com/llvm/llvm-project/issues/89508 is fixed
# BOLT_ARGS += -split-strategy=cdsplit
# Update DWARF debug info in the final binary
BOLT_ARGS += -update-debug-sections
# Print optimization statistics
BOLT_ARGS += -dyno-stats
# BOLT doesn't fully support computed gotos, https://github.com/llvm/llvm-project/issues/89117
# Use escaped regex as the name BOLT recognises is often a bit different, e.g. apply_cl/1(*2)
# This doesn't actually seem to do anything, the actual mitigation is not using --use-old-text
# which we do in the bolt target
BOLT_ARGS += -skip-funcs=.\*apply_cl.\*

# -fno-reorder-blocks-and-partition is needed on gcc >= 8.
BOLT_FLAGS := $\
"BOLT_CFLAGS_GCC+=-fno-reorder-blocks-and-partition" $\
"BOLT_LDFLAGS=-Wl,--emit-relocs"

STAGE0_BUILD:=$(CURDIR)/toolchain
STAGE1_BUILD:=$(CURDIR)/optimized.build

STAGE0_BINARIES:=$(STAGE0_BUILD)/usr/bin/

PROFILE_DIR:=$(CURDIR)/profiles-bolt
JULIA_ROOT:=$(CURDIR)/../..

LLVM_BOLT:=$(STAGE0_BINARIES)llvm-bolt
LLVM_MERGEFDATA:=$(STAGE0_BINARIES)merge-fdata

# If you add new files to optimize, you need to add BOLT_LDFLAGS and BOLT_CFLAGS to the build of your new file.
SYMLINKS_TO_OPTIMIZE := libLLVM.so libjulia-internal.so libjulia-codegen.so
FILES_TO_OPTIMIZE := $(shell for file in $(SYMLINKS_TO_OPTIMIZE); do readlink $(STAGE1_BUILD)/usr/lib/$$file; done)

AFTER_INSTRUMENT_MESSAGE:='Run `make finish_stage1` to finish off the build. $\
You can now optionally collect more profiling data by running Julia with an appropriate workload, $\
if you wish, run `make clean_profiles` before doing so to remove any profiling data generated by `make finish_stage1`. $\
You should end up with some data in $(PROFILE_DIR). Afterwards run `make merge_data && make bolt`. $\

$(STAGE0_BUILD) $(STAGE1_BUILD):
$(MAKE) -C $(JULIA_ROOT) O=$@ configure

stage0: | $(STAGE0_BUILD)
$(MAKE) -C $(STAGE0_BUILD)/deps install-BOLT && \
touch $@

# Build with our custom flags, binary builder doesn't use them so we need to build LLVM for now.
# We manually skip package image creation so that we can profile it
$(STAGE1_BUILD): stage0
stage1: export USE_BINARYBUILDER_LLVM=0
stage1: | $(STAGE1_BUILD)
$(MAKE) -C $(STAGE1_BUILD) $(BOLT_FLAGS) julia-src-release julia-symlink julia-libccalltest \
julia-libccalllazyfoo julia-libccalllazybar julia-libllvmcalltest && \
touch $@

copy_originals: stage1
for file in $(FILES_TO_OPTIMIZE); do \
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \
cp $$abs_file "$$abs_file.original"; \
done && \
touch $@

# I don't think there's any particular reason to have -no-huge-pages here, perhaps slightly more accurate profile data
# as the final build uses -no-huge-pages
bolt_instrument: copy_originals
for file in $(FILES_TO_OPTIMIZE); do \
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \
$(LLVM_BOLT) "$$abs_file.original" -o $$abs_file --instrument --instrumentation-file-append-pid --instrumentation-file="$(PROFILE_DIR)/$$file-prof" -no-huge-pages; \
mkdir -p $$(dirname "$(PROFILE_DIR)/$$file-prof"); \
printf "\n"; \
done && \
touch $@
@echo $(AFTER_INSTRUMENT_MESSAGE)

# We don't want to rebuild julia-src as then we lose the bolt instrumentation
# So we have to manually build the sysimage and package image
finish_stage1: stage1
$(MAKE) -C $(STAGE1_BUILD) julia-base-cache && \
$(MAKE) -C $(STAGE1_BUILD) -f sysimage.mk sysimg-release && \
$(MAKE) -C $(STAGE1_BUILD) -f pkgimage.mk release

merge_data: bolt_instrument
for file in $(FILES_TO_OPTIMIZE); do \
profiles=$(PROFILE_DIR)/$$file-prof.*.fdata; \
$(LLVM_MERGEFDATA) $$profiles > "$(PROFILE_DIR)/$$file-prof.merged.fdata"; \
done && \
touch $@

# The --use-old-text saves about 16 MiB of libLLVM.so size.
# However, the rust folk found it succeeds very non-deterministically for them.
# It tries to reuse old text segments to reduce binary size
# BOLT doesn't fully support computed gotos https://github.com/llvm/llvm-project/issues/89117, so we cannot use --use-old-text on libjulia-internal
# That flag saves less than 1 MiB for libjulia-internal so oh well.
bolt: merge_data
for file in $(FILES_TO_OPTIMIZE); do \
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \
$(LLVM_BOLT) "$$abs_file.original" -data "$(PROFILE_DIR)/$$file-prof.merged.fdata" -o $$abs_file $(BOLT_ARGS) $$(if [ "$$file" != $(shell readlink $(STAGE1_BUILD)/usr/lib/libjulia-internal.so) ]; then echo "--use-old-text -split-strategy=cdsplit"; fi); \
done && \
touch $@

clean_profiles:
rm -rf $(PROFILE_DIR)

clean:
rm -f stage0 stage1 bolt copy_originals merge_data bolt_instrument

restore_originals: copy_originals
for file in $(FILES_TO_OPTIMIZE); do \
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \
cp -P "$$abs_file.original" $$abs_file; \
done

delete_originals: copy_originals
for file in $(FILES_TO_OPTIMIZE); do \
abs_file=$(STAGE1_BUILD)/usr/lib/$$file; \
rm "$$abs_file.original"; \
done
17 changes: 17 additions & 0 deletions contrib/bolt/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
BOLT only works on x86_64 and arch64 on Linux.

DO NOT STRIP THE RESULTING .so FILES, https://github.com/llvm/llvm-project/issues/56738.
If you really need to, try adding `-use-gnu-stack` to `BOLT_ARGS`.

To build a BOLT-optimized version of Julia run the following commands (`cd` into this directory first)
```bash
make stage1
make copy_originals
make bolt_instrument
make finish_stage1
make merge_data
make bolt
```
After these commands finish, the optimized version of Julia will be built in the `optimized.build` directory.

This doesn't align the code to support huge pages as it doesn't seem that we do that currently, this decreases the size of the .so files by 2-4mb.
14 changes: 14 additions & 0 deletions contrib/pgo-lto-bolt/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
stage0*
stage1*
stage2*
bolt
bolt_instrument
merge_data
copy_originals

profiles
profiles-bolt

toolchain
pgo-instrumented.build
optimized.build
Loading