Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CMake][PGO] Build Sema.cpp to generate profdata for PGO builds #77347

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions clang/utils/perf-training/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
include(LLVMExternalProjectUtils)

set(CLANG_PGO_TRAINING_DATA "${CMAKE_CURRENT_SOURCE_DIR}" CACHE PATH
"The path to a lit testsuite containing samples for PGO and order file generation"
)
set(CLANG_PGO_TRAINING_DATA_SOURCE_DIR OFF CACHE STRING "Path to source directory containing cmake project with source files to use for generating pgo data")
set(CLANG_PERF_TRAINING_DEPS "" CACHE STRING "Extra dependencies needed to build the PGO training data.")

if(LLVM_BUILD_INSTRUMENTED)
configure_lit_site_cfg(
Expand All @@ -15,7 +19,7 @@ if(LLVM_BUILD_INSTRUMENTED)
)

add_custom_target(clear-profraw
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} profraw
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_BINARY_DIR}/profiles/ profraw
COMMENT "Clearing old profraw data")

if(NOT LLVM_PROFDATA)
Expand All @@ -26,9 +30,23 @@ if(LLVM_BUILD_INSTRUMENTED)
message(STATUS "To enable merging PGO data LLVM_PROFDATA has to point to llvm-profdata")
else()
add_custom_target(generate-profdata
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py merge ${LLVM_PROFDATA} ${CMAKE_CURRENT_BINARY_DIR}/clang.profdata ${CMAKE_CURRENT_BINARY_DIR}
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py merge ${LLVM_PROFDATA} ${CMAKE_CURRENT_BINARY_DIR}/clang.profdata ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_BINARY_DIR}/profiles/
COMMENT "Merging profdata"
DEPENDS generate-profraw)
if (CLANG_PGO_TRAINING_DATA_SOURCE_DIR)
llvm_ExternalProject_Add(generate-profraw-external ${CLANG_PGO_TRAINING_DATA_SOURCE_DIR}
USE_TOOLCHAIN EXLUDE_FROM_ALL NO_INSTALL DEPENDS generate-profraw)
add_dependencies(generate-profdata generate-profraw-external)
else()
# Default to compiling a file from clang. This also builds all the
# dependencies needed to build this file, like TableGen.
set(generate_profraw_clang_sema tools/clang/lib/Sema/CMakeFiles/obj.clangSema.dir/Sema.cpp.o)
llvm_ExternalProject_Add(generate-profraw-clang ${CMAKE_CURRENT_SOURCE_DIR}/../../../llvm
USE_TOOLCHAIN EXLUDE_FROM_ALL NO_INSTALL DEPENDS generate-profraw
EXTRA_TARGETS generate_profraw_clang_sema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be really fragile. The paths of object file outputs are an implementation detail of CMake that has changed in the past and could change in the future, also I think only the makefile and ninja generator actually export those as buildable targets.

Beyond that I also have a philosophical problem with the approach of using living clang sources as training data. I think this can result in unexplainable or difficult to diagnose differences in the efficacy of PGO because as you change the compiler you're also innately changing the training data. A potential better solution would be to have a preprocessed Sema.cpp from a recent LLVM release that we check in as a stable copy. This would be similar to how the GCC compile-time benchmark works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be really fragile. The paths of object file outputs are an implementation detail of CMake that has changed in the past and could change in the future, also I think only the makefile and ninja generator actually export those as buildable targets.

What about using the library target clangSema instead of the individual file?

Beyond that I also have a philosophical problem with the approach of using living clang sources as training data. I think this can result in unexplainable or difficult to diagnose differences in the efficacy of PGO because as you change the compiler you're also innately changing the training data. A potential better solution would be to have a preprocessed Sema.cpp from a recent LLVM release that we check in as a stable copy. This would be similar to how the GCC compile-time benchmark works.

The first thing I tried to do was add a preprocessed version of CommentSema.cpp, but the file was 10MB, and I thought that might be too big. Sema.cpp preprocessed is 5MB, which is smaller but still pretty big. I would be OK with this approach if we think it's OK to add a big file like this to the repo.

I'm also not concerned about efficacy of PGO, because building Sema.cpp is just the default. I really want there to be a good default experience for users, so that they can use the PGO cache files and things will 'just work'. This patch also adds a configuration option for advanced users that need something more consistent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the PGO use case is already an advanced-user feature. It's even documented in the "Advanced Builds" documentation.

If we want some more complex C++ sources there are some samples we could look at, like Kaleidoscope. The Kaleidoscope chapter 2 source builds without dependencies, and the other ones are also pretty simple to just compile.

If you're not concerned about the efficacy of the PGO build, I'm not sure why the existing single hello_world is a bad default. It isn't great but it is stable and something. If you want something that gives better performance then you must care about the efficacy, and I think you need to be somewhat concerned about the stability of those results, otherwise what's the point?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're not concerned about the efficacy of the PGO build, I'm not sure why the existing single hello_world is a bad default. It isn't great but it is stable and something. If you want something that gives better performance then you must care about the efficacy, and I think you need to be somewhat concerned about the stability of those results, otherwise what's the point?

I'm concerned about the efficacy if it is 0%, but not really concerned if it bounces around between say 10 and 25 % based on updates to the file we are using to train.

I tested with the existing default (hello_world) and I didn't see any performance improvements from that. We also did some manual builds (not using the 2-stage PGO cache) and could not get any performance gains for using a hello world source. How certain are we that hello_world is enough to get performance improvements from PGO? Has anyone tested this recently? I will re-run my tests just to double check.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw small gains when I added it, but that was years ago. My confidence in it having any meaningful gain today is close to 0. I still think using training sources that have no stability is a bad idea. I'd feel much more comfortable using the LLVM examples or some other coding samples as testing sources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea I'd like explore is using libc++ (and perhaps also libc) test suite for training. It consists of self-contained test programs which should sufficiently exercise various aspects of C/C++ frontend.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea I'd like explore is using libc++ (and perhaps also libc) test suite for training. It consists of self-contained test programs which should sufficiently exercise various aspects of C/C++ frontend.

I've updated the patch to use the check-cxx target to generate the profile data. The gains (~20%) are less than with Sema.cpp (~34%) but still pretty good. The only issue is that the profile data comes from building libcxx as well, so it's not just from the test suite.

It may be possible to get test data from just running the test suite, but that might be a lot more complicated to set up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea I'd like explore is using libc++ (and perhaps also libc) test suite for training. It consists of self-contained test programs which should sufficiently exercise various aspects of C/C++ frontend.

@petrhosek are you suggesting to use libc++ from in-tree or from a fixed and already released version as @llvm-beanz has suggested here? He hasn't suggested to use libc++ but a recent LLVM release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tstellar For the numbers you reported (~34% and ~20%), what did you use to collect those numbers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was compiling just a single file (SemaChecking.cpp).

CMAKE_ARGS -DLLVM_ENABLE_PROJECTS=clang)
add_dependencies(generate-profdata generate_profraw_clang_sema)
endif()
endif()
endif()

Expand Down
16 changes: 9 additions & 7 deletions clang/utils/perf-training/perf-helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,26 +30,28 @@ def findFilesWithExtension(path, extension):


def clean(args):
if len(args) != 2:
if len(args) < 2:
print(
"Usage: %s clean <path> <extension>\n" % __file__
"Usage: %s clean <paths> <extension>\n" % __file__
+ "\tRemoves all files with extension from <path>."
)
return 1
for filename in findFilesWithExtension(args[0], args[1]):
os.remove(filename)
for path in args[1:-1]:
for filename in findFilesWithExtension(path, args[-1]):
os.remove(filename)
return 0


def merge(args):
if len(args) != 3:
if len(args) < 3:
print(
"Usage: %s merge <llvm-profdata> <output> <path>\n" % __file__
"Usage: %s merge <llvm-profdata> <output> <paths>\n" % __file__
+ "\tMerges all profraw files from path into output."
)
return 1
cmd = [args[0], "merge", "-o", args[1]]
cmd.extend(findFilesWithExtension(args[2], "profraw"))
for i in range(2, len(args)):
cmd.extend(findFilesWithExtension(args[i], "profraw"))
subprocess.check_call(cmd)
return 0

Expand Down
23 changes: 23 additions & 0 deletions llvm/docs/AdvancedBuilds.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,29 @@ that also enables ThinTLO, use the following command:
-DPGO_INSTRUMENT_LTO=Thin \
<path to source>/llvm

By default, clang will generate profile data by compiling the Sema.cpp
file (and all its dependencies, e.g. TableGen). However, you can also
tell clang use an external project for generating profile data that may
be a better fit for your use case. The project you specify must either
be a lit test suite (use the CLANG_PGO_TRAINING_DATA option) or a CMake
project (use the CLANG_PERF_TRAINING_DATA_SOURCE_DIR option).

For example, If you wanted to use the
`LLVM Test Suite <https://github.com/llvm/llvm-test-suite/>`_ to generate
profile data you would use the following command:

.. code-block:: console

$ cmake -G Ninja -C <path to source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=/home/fedora/llvm-test-suite/ \
-DBOOTSTRAP_CLANG_PERF_TRAINING_DEPS=runtimes

The BOOTSTRAP\_ prefixes tells CMake to pass the variables on to the instrumented
stage two build. And the CLANG_PERF_TRAINING_DEPS option let's you specify
additional build targets to build before building the external project. The
LLVM Test Suite requires compiler-rt to build, so we need to add the
`runtimes` target as a dependency.

After configuration, building the stage2-instrumented-generate-profdata target
will automatically build the stage1 compiler, build the instrumented compiler
with the stage1 compiler, and then run the instrumented compiler against the
Expand Down