Evaluate more advanced optimizations like LTO, PGO, PLO #141
zamazan4ik
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
I just read an article about Harper at Reddit - nice work! I guess I have several possibly interesting ideas to try with Harper regarding its performance and binary size.
At first, I saw that Link-Time Optimization (LTO) was not enabled. Have you tried to enable it before for the project? It can help a lot with reducing the binary size and helps a compiler perform more aggressive optimizations (always a good thing to have). If you think that enabling LTO with the default one "Release" profile can affect developers experience too much, you can create a dedicated build profile like "advanced_release" or "dist" - many projects enable LTO exactly in this way.
Secondly, after LTO I highly recommend taking a look at PGO (Profile-Guided Optimization). This optimization gives to a compiler more information about how a program is executed. Based on this, the compiler can perform more aggressive optimizations with better runtime performance. I collect as much as many materials about PGO in my repo - https://github.com/zamazan4ik/awesome-pgo . There you can read more about actual PGO benchmarks in various software (parsers, compilers, databases, etc.). Also, highly recommend to read the (unfinished-yet) article/book about PGO - it can answer many of your possible questions.
I also performed some quick PGO benchmarks for the project based on its built-in benchmarks.
Test environment
harper
version:master
branch on commitccf14d1535c2f1450b42027afac2a8446f98e11d
taskset -c 0
is used for reducing the OS scheduler's noise during the benchmarks (as much as I can guarantee ofc). For PGO optimization I use cargo-pgo tool.I got the following results.
Release (
taskset -c 0 cargo bench --workspace --all-features
):PGO optimized compared to Release (
taskset -c 0 cargo pgo optimize bench -- --workspace --all-features
):(just for reference) PGO instrumented compared to Release (
taskset -c 0 cargo pgo bench -- --workspace --all-features
):According to the results, PGO can help with improving the library performance further. However, in the uncached example, we see performance degradation. I think it's due to the training dataset skew between loads for something like that - more experiments can be performed in this area. Before that, maybe this PGO-related information would be helpful for other performance-oriented users.
After PGO, I can suggest evaluating PLO (Post-Link Optimization) with LLVM BOLT as an additional optimization step. However, I recommend enabling it only after PGO (PGO usually works better than PLO in practice for now).
Regarding priorities. I highly suggest enabling LTO now. PGO and PLO, IMHO, can wait for more time (I guess spending this time on actual features would be a better option since switching on PGO with PLO, and possible CI pipelines tweaks can consume too much human resources).
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions