Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) usage #80
zamazan4ik
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are available here. According to the tests, PGO can help with achieving better performance in many cases for many applications including compilers and static analyzers. Since
pylyzer
cares about performance I thinkpylyzer
optimization with such optimization techniques will be an interesting idea.I already did some benchmarks and want to share my results here.
Test environment
tree-sitter
build),CFLAGS
are-O3
main
branch on commit70c23905ae768ab554000abeefab36fe48ab54f4
Benchmark
For benchmark purposes, I used a scenario from the README file -
pylyzer tests/test.py
. All PGO and PLO optimizations are done with cargo-pgo.For Release built-in benchmarks were tested with
cargo bench -p benches
. PGO instrumentation phase is done withcargo pgo bench -- -p benches
, PGO optimized benches are done withcargo pgo optimize bench -- -p benches
.All tests are done on the same machine, done multiple times (with
hyperfine
), with the same background "noise" (as much as I can guarantee of course) - the results are consistent across runs.LTO build is done by adding the following lines to the
Cargo.toml
:Results
The results:
where:
pylyzer_release
- Release buildpylyzer_release_with_lto
- Release + LTO buildpylyzer_optimized
- Release build + PGO optimizationpylyzer_pgo_and_bolt_optimized
- Release build + PGO optimization + PLO optimization (via LLVM BOLT)pylyzer_lto_and_bolt_optimized
- Release build + LTO + PLO optimization with LLVM BOLTAccording to the tests above, I see measurable improvements from LTO, PGO, and PLO.
For reference, I post performance results in the PGO and PLO (with and without LTO) instrumentation phases:
PGO instrumented run:
LTO enabled + PGO instrumented run:
LLVM BOLT instrumented run:
LTO enabled + LLVM BOLT instrumented run:
Further steps
I can suggest the following action points:
pylyzer
according to their workloads.Here are some examples of how PGO optimization is integrated into other projects:
configure
scriptI have some examples of how PGO information looks in the project-specific documentation:
Regarding LLVM BOLT integration, I have the following links:
By the way, I think applying PGO and PLO for https://github.com/erg-lang/erg will be a good idea. What do you think? If you agree with that - do I need to create a separate issue in the
erg
repo?Another idea - what do you think about enabling LTO for the project? It can help with performance and binary size reduction as well. However, currently, LTO and PGO cannot be enabled at the same time for Pylyzer due to a bug in Rustc: rust-lang/rust#115344 (comment) . According to the tests above, LTO + BOLT works even a bit faster than PGO + BOLT.
Beta Was this translation helpful? Give feedback.
All reactions