Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic PGO support. #48346

Merged
merged 18 commits into from
Mar 26, 2018
Merged

Add basic PGO support. #48346

merged 18 commits into from
Mar 26, 2018

Conversation

emilio
Copy link
Contributor

@emilio emilio commented Feb 19, 2018

This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), -C pgo-use and -C pgo-gen.

See each commit for details.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 19, 2018
@emilio
Copy link
Contributor Author

emilio commented Feb 19, 2018

r? @alexcrichton

May need some cleanup and such, I don't know the policy for new codegen options and such. Also, I've only tested on Linux so far, though it should work everywhere AFAICT.

@emilio emilio changed the title Add basic PGO support. [Do not merge yet] Add basic PGO support. Feb 19, 2018
@emilio
Copy link
Contributor Author

emilio commented Feb 19, 2018

I think this should mostly be ready for review, and sanity-checking the general approach, but I want to test it more during the week and such.

cc #1220

@emilio
Copy link
Contributor Author

emilio commented Feb 19, 2018

I have no idea about the Travis failures, does travis not use the same LLVM version as the repo submodule?

Some nice results with the same test program as https://github.com/Geal/pgo-rust with -O3, no LTO:

$ time ./no-pgo 10000000
10000000
-0.169075164
-0.169077842
./no-pgo 10000000  2.01s user 0.00s system 99% cpu 2.010 total
$ time ./pgo 10000000
10000000
-0.169075164
-0.169077842
./pgo 10000000  0.43s user 0.00s system 99% cpu 0.433 total

(which is pretty ridiculous IMO, though it's the best-case of course).

There's some broken stuff, like pgo-gen not being respected with opt-level (only default.profraw gets generated, not the specified filename, for some reason I still haven't figured...)

@emilio
Copy link
Contributor Author

emilio commented Feb 19, 2018

I think the numbers above aren't representative, since the no-pgo version is instrumented. Will get some actual numbers tomorrow.

@michaelwoerister
Copy link
Member

Thanks for looking into this, @emilio!

Experimental features like this usually start out with a -Z flag. That automatically makes them available only on nightly.

@LifeIsStrange
Copy link

Does it support autoFDO ? It's the secret magic that allow Clear Linux to be 40% Faster than others Linux distros : https://www.phoronix.com/scan.php?page=article&item=ubuntu-clear-eoy2017&num=2

https://clearlinux.org/features/automatic-feedback-directed-optimizer
That would be awesome is that could be enabled by default.. Because most people don't know about those options.

HS: I know you are a Firefox Dev @emilio don't you think you could use this for freely significatively improving Firefox/servo performance ?

if cg.pgo_gen.is_some() && !cg.pgo_use.is_empty() {
early_error(
error_format,
"options `-C pgo-gen` and `-C pgo-use` are exclussive",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling of exclusive

@emilio
Copy link
Contributor Author

emilio commented Feb 23, 2018 via email

@shepmaster
Copy link
Member

Ping from triage, @alexcrichton !

@alexcrichton
Copy link
Member

@shepmaster oh I think we're still waiting on numbers

@shepmaster shepmaster added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 3, 2018
@emilio
Copy link
Contributor Author

emilio commented Mar 3, 2018

Yes, numbers, put them under a experimental flag... All those need to happen, haven taken time yet, sorry!

@andjo403
Copy link
Contributor

andjo403 commented Mar 10, 2018

was trying to get some times for this PR and is confused by the result.
$ time ./main 1000000000
-0.169075164
-0.169051540

real 0m38.155s user 0m38.125s sys 0m0.016s
$ time ./main-pgo-use 1000000000
-0.169075164
-0.169051540

real 1m41.192s user 1m41.156s sys 0m0.000s

so the pgo version is 2.6x slower than the non-pgo version :(

What I did to get this numbers
$ git clone https://github.com/Geal/pgo-rust.git
$ cd pgo-rust
$ rustc +stage2 -Copt-level=3  -Ccodegen-units=1 -Cpgo-gen=test.profraw src/main.rs -o main-pgo-gen
$ ./main-pgo-gen 1000000000
$ ~/rust/build/x86_64-unknown-linux-gnu/llvm/bin/llvm-profdata merge -output=pgo.profdata test.profraw
$ rustc +stage2 -C opt-level=3 -Ccodegen-units=1 -Cpgo-use=pgo.profdata src/main.rs -o main-pgo-use
$ rustc +stage2 -C opt-level=3 -Ccodegen-units=1 src/main.rs -o main
$ time ./main         1000000000
$ time ./main-pgo-use 1000000000

@emilio
Copy link
Contributor Author

emilio commented Mar 12, 2018

So, I could sort of repro the numbers from @andjo403, (not so extreme 60 vs 90 seconds, with a stage1 compiler).

Good news is that it works and the profile data can be used to tweak the generated code ;).

Bad news is that it seems to barf at least on that specific test-case :(.

This is the script I've used to profile this:

#!/usr/bin/env bash

set -o errexit

RUSTC=./build/x86_64-unknown-linux-gnu/stage1/bin/rustc
TEST=test.rs
FLAGS="-Copt-level=3 -Ccodegen-units=1 -C save-temps"
COUNT=100000000

rm -rf *.profraw *.profdata

$RUSTC $FLAGS -Cpgo-gen=default.profraw $TEST -o test-pgo-gen
./test-pgo-gen $COUNT

./build/x86_64-unknown-linux-gnu/llvm/bin/llvm-profdata merge -output=pgo.profdata default.profraw

$RUSTC $FLAGS -Cpgo-use=pgo.profdata $TEST -o test-pgo-use
$RUSTC $FLAGS $TEST -o test-no-pgo

time ./test-no-pgo $COUNT
time ./test-pgo-use $COUNT

The llvm IR produced before optimizations is the same for all three executables (I had to disable probestack unconditionally for this to be true).

This is the optimized IR for the pgo-use and no-pgo: https://gist.github.com/emilio/93875578cfd1399a18ce764917f57ab4.

So, so far the IR seems believable (it has a bunch of cold annotations, and it seems to have branch information). It seems not to unroll a loop that might be critical for this test-case (just look at the number of sqrt calls).

So I'll try to gather other test-cases and figure out what's going on, but if someone that knows more about LLVM has any idea, that'd be really nice :)

@emilio
Copy link
Contributor Author

emilio commented Mar 12, 2018

So, a bit more information... It looks that what is destroying performance in this particular test is the instrumentation pre-inlining that LLVM does:

https://github.com/rust-lang/llvm/blob/6ceaaa4b0176a200e4bbd347d6a991ab6c776ede/lib/Transforms/IPO/PassManagerBuilder.cpp#L262-L282

So, with a manual -disable-preinline, perf comes back to baseline values. In this particular test-case, after that fix, and a manual fixup to the profile that I'm investigating (the version number generated claims to be a "Front-end" profile instead of a IR one), I manage to get a LLVM IR with PGO annotations.

At -Copt-level=3, the perf difference for this test-case is just negligible, but I'd want to experiment more, given the IR annotations are correct.

That being said, I think I want to land this for the following tweaks:

  • Moving the -C flags to be -Z.
  • Adding another -Z flag to disable the profiling pre-inliner, to be able to experiment with it easily.
  • Issues filled for the remaining work and investigation.

@alexcrichton @michaelwoerister does that sound reasonable?

I really need to figure out why the profile version is wrong (afaict the profile version variable in the instrumented binary is wrong), but that doesn't block experimentation (I just need to tweak a byte manually).

@alexcrichton
Copy link
Member

@emilio sounds reasonable to me, thanks for the continuing investigation here!

@emilio emilio force-pushed the pgo branch 4 times, most recently from 26c459e to f8a9a5f Compare March 13, 2018 07:58
@michaelwoerister
Copy link
Member

👍

@emilio
Copy link
Contributor Author

emilio commented Mar 13, 2018

I think I'm happy with the current state of the PR.

r? @michaelwoerister

@bors bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Mar 25, 2018
@bors
Copy link
Contributor

bors commented Mar 25, 2018

⌛ Testing commit 5af2f80 with merge e1d05bd6cc37cba62ad232d6edb131cf0aa4d178...

@bors
Copy link
Contributor

bors commented Mar 25, 2018

💔 Test failed - status-appveyor

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 25, 2018
@emilio
Copy link
Contributor Author

emilio commented Mar 25, 2018

Sigh, the relevant tests failed because of:

  = note: libprofiler_builtins-21bf9fe1eea259d7.rlib(InstrProfilingFile.o) : error LNK2019: unresolved external symbol fileno referenced in function doProfileMerging
          libprofiler_builtins-21bf9fe1eea259d7.rlib(InstrProfilingFile.o) : error LNK2019: unresolved external symbol getpid referenced in function getCurFilename
          C:/projects/rust/build/x86_64-pc-windows-msvc/test/run-make-fulldeps/pgo-gen.stage2-x86_64-pc-windows-msvc\test.exe : fatal error LNK1120: 2 unresolved externals

Don't have a windows machine, but last commit should fix, hopefully.

@alexcrichton
Copy link
Member

Hm an error like that probably indicates that this isn't ready to compile on MSVC yet? Or maybe we're not compiling it correctly? Perhaps it should be disabled there?

@emilio
Copy link
Contributor Author

emilio commented Mar 26, 2018

Hm an error like that probably indicates that this isn't ready to compile on MSVC yet? Or maybe we're not compiling it correctly? Perhaps it should be disabled there?

Yeah, I think my patch should fix, it's what we do for other similar functions that in MSVC have an underscore prefix. I'd like to try (I don't have try access on this repo I think), and if it happens not to work I'll disable it in MSVC.

@emilio
Copy link
Contributor Author

emilio commented Mar 26, 2018

s/MSVC/Windows above I guess :)

@alexcrichton
Copy link
Member

@bors: r+

Ok we can try! We will definitely want to scrutinize this before stabilization!

@bors
Copy link
Contributor

bors commented Mar 26, 2018

📌 Commit 1e1d907 has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 26, 2018
@andjo403
Copy link
Contributor

maybe this is the same fault that LDC have, they have dissabled IR PGO for windows see ldc-developers/ldc#2539

@emilio
Copy link
Contributor Author

emilio commented Mar 26, 2018

maybe this is the same fault that LDC have, they have dissabled IR PGO for windows see ldc-developers/ldc#2539

That looks like a different error. This was an error linking libprofiler_builtins. Maybe we find the same error now, but I hope not :)

@bors
Copy link
Contributor

bors commented Mar 26, 2018

⌛ Testing commit 1e1d907 with merge 13a86f4...

bors added a commit that referenced this pull request Mar 26, 2018
Add basic PGO support.

This PR adds two mutually exclusive options for profile usage and generation using LLVM's instruction profile generation (the same as clang uses), `-C pgo-use` and `-C pgo-gen`.

See each commit for details.
@bors
Copy link
Contributor

bors commented Mar 26, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 13a86f4 to master...

@bors bors merged commit 1e1d907 into rust-lang:master Mar 26, 2018
@emilio emilio deleted the pgo branch March 26, 2018 15:55
@matthiaskrgr
Copy link
Member

For other people who want try this out: the flag is enabled via -Z and not via -C as stated in the PR summary at the top!

@leonardo-m leonardo-m mentioned this pull request Mar 27, 2018
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 11, 2023
Mark `___asan_globals_registered` as an exported symbol for LTO

Export a weak symbol defined by AddressSanitizer instrumentation.
Previously, when using LTO, the symbol would get internalized and eliminated.

Fixes rust-lang#113404.

---------------

FWIW, let me list similar PRs from the past + who reviewed them:

* rust-lang#68410 (fixing `__msan_keep_going` and `__msan_track_origins`; reviewed by `@varkor)`
* rust-lang#60038 and rust-lang#48346 (fixing `__llvm_profile_raw_version` and `__llvm_profile_filename`; reviewed by `@alexcrichton)`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 12, 2023
Mark `___asan_globals_registered` as an exported symbol for LTO

Export a weak symbol defined by AddressSanitizer instrumentation.
Previously, when using LTO, the symbol would get internalized and eliminated.

Fixes rust-lang#113404.

---------------

FWIW, let me list similar PRs from the past + who reviewed them:

* rust-lang#68410 (fixing `__msan_keep_going` and `__msan_track_origins`; reviewed by ``@varkor)``
* rust-lang#60038 and rust-lang#48346 (fixing `__llvm_profile_raw_version` and `__llvm_profile_filename`; reviewed by ``@alexcrichton)``
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.