-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LTO for linux as well #103058
base: main
Are you sure you want to change the base?
Use LTO for linux as well #103058
Conversation
Seeing the expected TP diffs :) |
eng/native/configurecompiler.cmake
Outdated
@@ -942,6 +938,10 @@ if (MSVC) | |||
set(CMAKE_ASM_MASM_FLAGS "${CMAKE_ASM_MASM_FLAGS} /nologo") | |||
endif (MSVC) | |||
|
|||
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION ON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will use -flto=thin
.
-flto
might give better performance at the cost of slower compiles of coreclr. However, there is no cmake flag for this, so it would have to be done using the flags directly.
I'd recommend trying this out in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this mean? I'm guessing it's not a 10-21% performance lift on Linux platforms. |
This isn't for code that RyuJIT produces, but rather the code generated by |
We might expect some improvements for performance as well since it also enables LTO for VM and GC |
Co-authored-by: Jeremy Koritzinsky <jkoritzinsky@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Let's add a few comments though so we know why certain decisions were made/are required when we look at this in the future.
@@ -152,6 +153,15 @@ elseif (CLR_CMAKE_HOST_UNIX) | |||
endif() | |||
endif(MSVC) | |||
|
|||
check_ipo_supported(RESULT result OUTPUT output) | |||
if(result AND NOT CLR_CMAKE_TARGET_APPLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about why IPO is disabled on apple platforms?
@@ -61,6 +61,7 @@ else() | |||
target_include_directories(cdac_data_descriptor BEFORE PRIVATE ${VM_DIR}) | |||
target_include_directories(cdac_data_descriptor BEFORE PRIVATE ${VM_DIR}/${ARCH_SOURCES_DIR}) | |||
target_include_directories(cdac_data_descriptor PRIVATE ${CLR_DIR}/interop/inc) | |||
set_target_properties(cdac_data_descriptor PROPERTIES INTERPROCEDURAL_OPTIMIZATION OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment about why we're disabling IPO here?
If this is only adding runtime/src/coreclr/pgosupport.cmake Line 57 in bfb028b
SPMI jobs seem to explicitly pass
We should not use LTO on native AOT runtime binaries. LTO would place LLVM bitcode (instead of machine code) into the object files that we ship. Such object files can only be linked with a matching LLVM linker plugin. We don't have LLVM or LLVM version requirements for PublishAot. |
Tagging subscribers to this area: @hoyosjs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this interact with the -flto we set here?
@kunalspathak Could you please shed some light on this?
If we are specifying both -flto
and -flto=thin
on the command line when building the shipping binaries, there is a chance that -flto=thin
is specified second, overrides the full -flto
and we will see large scale perf regression.
It would be best to avoid specifying both -flto
and -flto=thin
on the command line when building the shipping binaries to avoid the problems with command line options overriding each other.
So my impression was that
Agree that we should not pessimize the flags to get regression on shipped binaries.
So I am trying to understand how this is working currently. If I compare the TP difference with and without the
However, with this PR, even when compiled with |
Do you see packages with PGO optimization data being downloaded and the data being successfully applied by the compiler? |
Yes, that would make sense. If nothing else, it should reduce the gap between the shipping config and what you are measuring. |
If you set the |
https://gitlab.kitware.com/cmake/cmake/-/issues/23136 tracks being able to specify the flavor of LTO in CMake. |
No description provided.