Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal arm crossgen compiler #58279

Merged

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Aug 27, 2021

Stop generating OS specific compilers for the arm and arm64 architectures for use in crossgen2

  • As it turns out the OS differences here are relatively minor and fairly easy to handle with runtime switches.
  • However, when an exact target is known we can generate better code by having the various if statements evaluate to a constant true/false. (This is done for all runtime jit builds)
  • This has been validated to produce binary identical output for generating System.Private.CoreLib.dll on the various architectures.
  • This work reduces the distribution size of crossgen2, and also reduces the amount of time it takes to build everything (Most notably, the Mac build times in PR's are reduced by 5-7 minutes, but improvements are visible across all of/architecture pairs)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 27, 2021
@ghost
Copy link

ghost commented Aug 27, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Stop generating OS specific compilers for the arm and arm64 architectures for use in crossgen2

  • As it turns out the OS differences here are relatively minor and fairly easy to handle with runtime switches.

  • However, when an exact target is known we can generate better code by having the various if statements evaluate to a constant true/false. (This is done for all runtime jit builds)

  • This work is in progress, as I haven't yet verified that we generate binary identical output with the new universal compilers, but I'm interested in feedback, and finding out how this affects build times in CI

Author: davidwrighton
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@davidwrighton davidwrighton marked this pull request as draft August 27, 2021 18:06
  - Don't special case Windows handling of clrjit.dll, just use the crossgen2 targetted one
- Fix cross compilation targets
  - Build the universal cross clrjit targets, but they don't need to be part of the runtimepack
- Fixup setting of compMatchedVM for macOS X64 in crossgen2
- Adjust superpmi setup to know about the new dll names
- Adjust DEBUG_ARG_SLOTS handling to work with osx arm64 abi. (By disabling anything meaningful)
@davidwrighton davidwrighton marked this pull request as ready for review September 1, 2021 22:01
@davidwrighton
Copy link
Member Author

I don't intend to fix the formatting until all other comments are complete, as formatting fixes have a tendency of making merges a bit harder. I'd like to get a round of feedback dealt with before that.

@davidwrighton
Copy link
Member Author

@dotnet/jit-contrib I'd like to get feedback for this work. It works nicely, and the improvements to PR times, and crossgen2 file size are compelling, but I'd like some guidance on naming and generally ifdef hygiene in my changes.

create_standalone_jit(TARGET clrjit_unix_x64_${ARCH_HOST_NAME} OS unix ARCH x64 DESTINATIONS .)
create_standalone_jit(TARGET clrjit_win_arm64_${ARCH_HOST_NAME} OS win ARCH arm64 DESTINATIONS .)
create_standalone_jit(TARGET clrjit_win_x64_${ARCH_HOST_NAME} OS win ARCH x64 DESTINATIONS .)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are "universal" x64 and x86 also possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory this is possible, but its quite a bit more difficult and would require a fair amount more changes. The Arm64 and Arm architecture jits produce very similar code, but the x64 ABI is quite different between Unix platforms and Windows (for instance, there are quite a lot of fields on GenTree structures that are different for Unix and Windows.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the various TARGET_ OS defines still exist it seems like might be easy for someone to do the wrong thing and use them when they shouldn't.

I wonder if we could make this more obvious at build time somehow?

#ifdef TARGET_UNIX
#ifdef TARGET_OSX
pEEInfoOut->osType = CORINFO_MACOS;
#elif defined(TARGET_UNIX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we bother with faking up EEInfo instead of blowing things up.

@BruceForstall any idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the JIT (or JIT32/JIT64) only called getEEInfo() once and reused the info for all method compiles, so only some MCs would have it? Looks like RyuJIT calls it for every compile.

#define compFeatureVarArg() (TargetOS::IsWindows && !TargetArchitecture::IsArm32)
#define compMacOsArm64Abi() (TargetArchitecture::IsArm64 && TargetOS::IsMacOS)
#define compFeatureArgSplit() ( TargetArchitecture::IsArm32 || (TargetOS::IsWindows && TargetArchitecture::IsArm64) )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to have #defines that look like methods; why not just create methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned them into methods. Seems to work fine.

- Undef all TARGET_ OS macros within the JIT so that they can't be used there
- Convert the feature macros to inline functions. Should have equivalent perf on release/chk builds as macro approach, and is less problematic in general.
@davidwrighton
Copy link
Member Author

@AndyAyersMS I've set it up to undef the TARGET_ macros for the OS and it seems to not have broken anything.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Changes LGTM (modulo formatting).

I'd like @BruceForstall to also take a look.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it looks pretty good, and not too much overhead, either conceptual or runtime.

Misc:

  1. It looks like superpmi-replay.py needs some change to handle the "universal" naming.
  2. Will it work to use superpmi replay with an altjit? E.g., use a Windows arm64 collection with a Linux arm64 altjit? It seems like the collection is going to force the "universal" binary to generate for windows, not Linux.
  3. Looks like the overhead might not be too bad, although we will include CFI support for all builds, even though it's only needed for Linux targeting CoreRT.
  4. There are now more fields that are always available that can be set/read even when illegal. We'll have to think about ways to verify that doesn't happen, maybe new asserts or the like. Like accessing the CFI data array. It's a case of converting compile-time to run-time error.

#ifdef TARGET_UNIX
#ifdef TARGET_OSX
pEEInfoOut->osType = CORINFO_MACOS;
#elif defined(TARGET_UNIX)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the JIT (or JIT32/JIT64) only called getEEInfo() once and reused the info for all method compiles, so only some MCs would have it? Looks like RyuJIT calls it for every compile.

// On all platforms except Arm64 OSX arguments on the stack are taking
// register size slots. On these platforms we could check that stack slots count
// matches our new byte size calculations.
#define DEBUG_ARG_SLOTS_ASSERT(x) assert(compMacOsArm64Abi() || x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define DEBUG_ARG_SLOTS_ASSERT(x) assert(compMacOsArm64Abi() || x)
#define DEBUG_ARG_SLOTS_ASSERT(x) assert(compMacOsArm64Abi() || (x))

Extra parens around x, for precedence safety.

@@ -2307,8 +2313,8 @@ void CodeGen::genCallInstruction(GenTreeCall* call)
#endif // TARGET_ARM
}
}
#if FEATURE_ARG_SPLIT
else if (curArgTabEntry->IsSplit())
#if FEATURE_ARG_SPLIT_SUPPORTED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments about FEATURE_ARG_SPLIT_SUPPORTED: it's not clear the renaming of the ifdef matters. Also, now that it is always enabled for arm32 & arm64, we could even remove the ifdefs in arm32/arm64-only code files, like codegenarmarch.cpp. Also, here there's no need to check for compFeatureArgSplit() since IsSplit() is also doing that.

#if FEATURE_ARG_SPLIT
else if (tree->OperIsPutArgSplit())
#if FEATURE_ARG_SPLIT_SUPPORTED
else if (compFeatureArgSplit() && tree->OperIsPutArgSplit())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, in cases like this we don't need to check for compFeatureArgSplit() && because we would never have inserted a GT_PUTARG_SPLIT tree node unless it was legal?

#elif defined(TARGET_WINDOWS) && defined(TARGET_AMD64)
assert(!"Multireg store to SIMD reg not supported on Windows x64");
#elif defined(TARGET_AMD64)
assert(!TargetOS::IsWindows && !"Multireg store to SIMD reg not supported on Windows x64");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(!TargetOS::IsWindows && !"Multireg store to SIMD reg not supported on Windows x64");
assert(!TargetOS::IsWindows || !"Multireg store to SIMD reg not supported on Windows x64");

Shouldn't this use ||?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Oops. Thanks for catching that.

#ifdef TARGET_OS_RUNTIMEDETERMINED
if (!TargetOS::OSSettingConfigured)
{
CORINFO_EE_INFO* eeInfo = eeGetEEInfo();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this line; it was just queried 5 lines above.

TargetOS::IsMacOS = eeInfo->osType == CORINFO_MACOS;
TargetOS::IsUnix = (eeInfo->osType == CORINFO_UNIX) || (eeInfo->osType == CORINFO_MACOS);
TargetOS::IsWindows = eeInfo->osType == CORINFO_WINNT;
TargetOS::OSSettingConfigured = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe for these to be set as statics? E.g., multi-thread safe?

Do we want to support changing the target on multiple subsequent JIT invocations? E.g., would we want to support merging SuperPMI method contexts for different target OS settings and then replaying them all one after another with the same loaded JIT? Having these be static seems dangerous.

I wonder if they should be on the JIT "host" api ICorJitHost.

return gtOper == GT_PUTARG_SPLIT;
#else // !FEATURE_ARG_SPLIT
#if FEATURE_ARG_SPLIT_SUPPORTED
return compFeatureArgSplit() && (gtOper == GT_PUTARG_SPLIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case where it seems like we shouldn't have to check compFeatureArgSplit; we should just ensure we don't create a GT_PUTARG_SPLIT if it's not legal.

@davidwrighton
Copy link
Member Author

@BruceForstall

  1. Ok. I'll make a change, but I don't know how to use that script
  2. Possibly. SuperPMI would need to specify that it is attempting to compile for the target architecture. Something would likely need to change in SuperPMI to make that happen, but I'm not familiar enough with how to use it to feel comfortable trying to make that change.
  3. Yeah, I'm pretty sure the overhead is minimal.
  4. Yes. This is a major concern of mine. Looking at what I needed to do, the number of variables newly exposed in this current change is pretty low, and thus low-risk, but if we want to ever try to merge the X64 jits, it will be highly risky as there are many fields that would come into use.

@BruceForstall
Copy link
Member

Ok. I'll make a change, but I don't know how to use that script

@kunalspathak wrote the superpmi-replay.py script, so he can probably advise.

Re (2): we might need some way for superpmi to override the target OS that is baked into the collection. I guess I wouldn't worry about it for now, as we do have arch/OS matching collections, which most people would use.

@davidwrighton
Copy link
Member Author

@BruceForstall, your comment on initialization made me realize that it was somewhat broken in some multithreading scenarios on Crossgen2, so I plumbed through the Jit a way to set it explicitly, which looks much nicer. I also jammed the OS into superpmi, but it isn't all that pretty. My guess is that superpmi probably works with this change now.

@kunalspathak
Copy link
Member

@kunalspathak wrote the superpmi-replay.py script, so he can probably advise.

Changes in superpmi-replay.py looks good. Some minor changes in other scripts:

diff --git a/src/coreclr/scripts/jitrollingbuild.py b/src/coreclr/scripts/jitrollingbuild.py
index c5bf6d065b5..0003805b499 100644
--- a/src/coreclr/scripts/jitrollingbuild.py
+++ b/src/coreclr/scripts/jitrollingbuild.py
@@ -309,8 +309,8 @@ def upload_command(coreclr_args):

     # Next, look for any and all cross-compilation JITs. These are named, e.g.:
     #   clrjit_unix_x64_x64.dll
-    #   clrjit_win_arm_x64.dll
-    #   clrjit_win_arm64_x64.dll
+    #   clrjit_universal_arm_x64.dll
+    #   clrjit_universal_arm64_x64.dll
     # and so on, and live in the same product directory as the primary JIT.
     #
     # Note that the expression below explicitly filters out the primary JIT since we added that above.
diff --git a/src/coreclr/scripts/superpmi.py b/src/coreclr/scripts/superpmi.py
index c43592897e7..dd69e1cd7c7 100755
--- a/src/coreclr/scripts/superpmi.py
+++ b/src/coreclr/scripts/superpmi.py
@@ -254,7 +254,7 @@ superpmi_common_parser.add_argument("--break_on_error", action="store_true", hel
 superpmi_common_parser.add_argument("--skip_cleanup", action="store_true", help=skip_cleanup_help)
 superpmi_common_parser.add_argument("--sequential", action="store_true", help="Run SuperPMI in sequential mode. Default is to run in parallel for faster runs.")
 superpmi_common_parser.add_argument("-spmi_log_file", help=spmi_log_file_help)
-superpmi_common_parser.add_argument("-jit_name", help="Specify the filename of the jit to use, e.g., 'clrjit_win_arm64_x64.dll'. Default is clrjit.dll/libclrjit.so")
+superpmi_common_parser.add_argument("-jit_name", help="Specify the filename of the jit to use, e.g., 'clrjit_universal_arm64_x64.dll'. Default is clrjit.dll/libclrjit.so")
 superpmi_common_parser.add_argument("--altjit", action="store_true", help="Set the altjit variables on replay.")
 superpmi_common_parser.add_argument("-error_limit", help=error_limit_help)

diff --git a/src/coreclr/scripts/superpmi_replay_setup.py b/src/coreclr/scripts/superpmi_replay_setup.py
index 34cc8301fc5..c6c6fcb507f 100644
--- a/src/coreclr/scripts/superpmi_replay_setup.py
+++ b/src/coreclr/scripts/superpmi_replay_setup.py
@@ -94,7 +94,7 @@ def match_correlation_files(full_path):
     file_name = os.path.basename(full_path)

     if file_name.startswith("clrjit_") and file_name.endswith(".dll") and file_name.find(
-        "osx") == -1 and file_name.find("armel") == -1:
+        "osx") == -1:
         return True

     if file_name == "superpmi.exe" or file_name == "mcs.exe":

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Added a few more comments.

// Some JIT's may support multiple OSs. This api provides a means to specify to the JIT what OS it should
// be trying to compile. This api does not produce any errors, any errors are to be generated by the
// the compileMethod call, which will call back into the VM to ensure bits are correctly setup.
virtual void setJitOs(CORINFO_OS os) = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we use camel casing, but "Os" looks really weird. I don't like "Jit" in the function name.

Proposed name: setTargetOS

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have no strong opinions about naming.

else if (TargetOS::IsUnix)
{
printf(" - Unix");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a TargetOS::IsMacOS case here, too? (sub-case of IsUnix?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I was replicating the old behavior, but adding a new case for the MacOS case makes sense. Consider it done.

bool TargetOS::IsMacOS = false;
#endif

void CILJit::setJitOs(CORINFO_OS os)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a standard format JIT header comment

@@ -6,6 +6,8 @@
#define _JIT_H_
/*****************************************************************************/

#include "targetosarch.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be moved down after #include "corjit.h"

@@ -300,6 +300,23 @@ void CILJit::getVersionIdentifier(GUID* versionIdentifier)
memcpy(versionIdentifier, &JITEEVersionIdentifier, sizeof(GUID));
}

#ifdef TARGET_OS_RUNTIMEDETERMINED
bool TargetOS::OSSettingConfigured = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these default to the TARGET_* defines (before those get #undefed)? As is, it appears that all invokers of the JIT MUST call the new setJitOs API before calling compileMethod. If you want that behavior, then there should be a comment in ICorJitCompiler explicitly stating this ordering requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to just make setTargetOS a required call before jitting anything via the comment approach. Making stuff happen before the #undef is likely to be pretty ugly, and the jit doesn't need any more macro goo than it already has.

@@ -47,4 +47,6 @@ void getVersionIdentifier(GUID* versionIdentifier /* OUT */
// intrinsics, so the EE should use the default size (i.e. the size of the IL implementation).
unsigned getMaxIntrinsicSIMDVectorLength(CORJIT_FLAGS cpuCompileFlags); /* { return 0; } */

// Used to specify which OS a JIT should be compiled for. Currently expected to only be called once pre process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be exactly the same comment as in corjit.h

@@ -10,6 +10,14 @@

#define fatMC // this is nice to have on so ildump works...

CORINFO_OS g_currentOs = CORINFO_WINNT;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add this as a member to interceptor_ICJC, not a global.

And the default should probably be based on the TARGET_* defines, not always Windows. (This would be the same function as getClrVmOs?)

@davidwrighton davidwrighton merged commit fcaba18 into dotnet:main Sep 28, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
@davidwrighton davidwrighton deleted the universal_arm_crossgen_compiler branch April 13, 2023 18:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants