Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) usage #1

Open
zamazan4ik opened this issue May 23, 2024 · 1 comment
Open

Evaluate Profile-Guided Optimization (PGO) usage #1

zamazan4ik opened this issue May 23, 2024 · 1 comment

Comments

@zamazan4ik
Copy link

Hi!

Since the README file mentions a lot of performance-oriented things, I decided to test one compiler optimization - Profile-Guided Optimization (PGO) on genson-rs. I already tested it on various projects with positive results (you can find all benchmarks here: https://github.com/zamazan4ik/awesome-pgo), so here are the benchmark results for genson-rs.

Test environment

  • Fedora 39
  • Linux kernel 6.8.7
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.78
  • The project version: the latest for now from the main branch on commit 67afe6d3ad8d10affb65b251694ca7b52b978769
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks. For PGO optimization I use cargo-pgo tool. Release bench result I got with the taskset -c 0 cargo bench command. The PGO training phase is done with taskset -c 0 cargo pgo bench, PGO optimization phase - with taskset -c 0 cargo pgo optimize bench.

All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee). taskset -c 0 is used for reducing OS scheduler "noise".

Results

I got the following results:

According to the results, PGO measurably improves the tool's performance at least in the benchmark above.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks with other test files. If it shows improvements - add a note to the documentation (README file?) about possible improvements in the tool's performance with PGO.
  • Optimize prebuilt binaries with PGO (if any). As a training set, you can try to gather multiple real-life files, train PGO on them, and deliver pre-PGO-optimized binaries to the users.
  • Consider enabling Link-Time Optimization (LTO) for the tool. It can help with optimizing performance and reducing the binary size.

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

I would be happy to answer your questions about PGO.

P.S. I created the Issue since Discussions are disabled for the repo. Since it's not the issue but an improvement idea, probably Discussions is a better place to discuss such things.

@junyu-w
Copy link
Owner

junyu-w commented May 24, 2024

This is great idea! I will test it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants