-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Windows return simd #32278
[WIP] Windows return simd #32278
Conversation
if (jitMgr->LoadJIT()) | ||
{ | ||
CORJIT_FLAGS cpuCompileFlags = jitMgr->GetCPUCompileFlags(); | ||
if (cpuCompileFlags.IsSet(CORJIT_FLAGS::CORJIT_FLAG_FEATURE_SIMD)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not feel right to vary calling convention based on these flags. Would it be better for CORJIT_FLAG_FEATURE_SIMD
to only control whether hardware intrinsics are enable, but not alter calling convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we don't have support for any of the TYP_SIMD
stuff without this flag. So there might be a bit more involved in recognizing the type and getting things to work in that scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this flag be unconditionally set on Windows x64 then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is currently the case for the mainline build:
runtime/src/coreclr/src/jit/CMakeLists.txt
Lines 13 to 16 in 5040afe
if (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND NOT CLR_CMAKE_HOST_UNIX)) | |
add_definitions(-DFEATURE_SIMD) | |
add_definitions(-DFEATURE_HW_INTRINSICS) | |
endif () |
@CarolEidt might know if there are any cases where it is or could be disabled (at least from a quick grep, it is disabled for armelnonjit
, linuxnonjit
, and protononjit
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this weird quirk for the no simd testing scenario, but as you say, its not really supported or generally tested. Would there be objection to removing the ability to turn off the ability to disable feature simd on Windows X64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unlikely we will ever support an X86 platform where we don't have basic SSE support, as such I think disabling SIMD support doesn't make much sense except for esoteric test scenarios that it turns out we don't use much anyways. A much more interesting question is for ARM64 on Linux and ARM32.
Arm32 doesn't support simd today, and thus an abi shift based on FEATURE_SIMD support will be necessary for when we add support for Vector<T>
, as the abi of Vector<T>
is fundamentally based on using a vector register. However, I think it is reasonable for the Vector abi to vary based on weird switches, as it really is supposed to be dynamic.
For Arm64, we support AdvSimd and floating point in the baseline, and according to the comment in pal\src\misc\jitsupport.cpp, we don't support platforms without that support, but we do actually have the checks that call into the kernel to ask if such support exists. It would be quite reasonable for a Vector<T>
to use SVE in the future, and using SVE might imply a different calling convention. That adjusted calling convention would be problematic, as the architectural max size of a SVE register is 2048 bits, and would thus bloat our transition block structure enormously.
As far as "supported calling convention on a platform", well, the SysV Intel abi on Linux/Mac is a bit peculiar around this. It allows for use of the registers as long as all compiled is compiled as machine aware, but the struct types work regardless. For instance, if you compile a library as Avx aware in clang on Linux, and expose an abi that manipulates an __m256, and call that function from another binary that was not compiled as Avx aware but does use the __m256 structure, then the same C++ compiler will generate conflicting abis, and things will break. Exposing interop support, will cause us to need to be able to talk to such issues, and reasons like that are why this work will not directly enable vector marshalling support.
For Windows, matters are a bit simpler. On Windows the C++ abi does not vary by compiler switch. The compiler switches will however, drive if a __m256 type is useable at all.
Arm32/64 abi requires use of SIMD registers for the SIMD types. Vector128<T>
should only be passed in SIMD registers, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no simd testing scenario
We should have a way to disable hardware accelerated Vector and the hardware intrinsics IsSupported methods so that people can test their fallback paths. But the calling convention can stay the same, even when these are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jkotas here - we need to have a way to test fallback paths (though I would note that this should really only impact the ability to test those fallback paths on a platform that would otherwise not require them), and it may be that changing the behavior of FEATURE_SIMD
in this way will require additional changes, but if so those changes should be made.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a way to disable hardware accelerated Vector and the hardware intrinsics IsSupported methods so that people can test their fallback paths.
In the case of HWIntrinsics, this is done via the COMPlus_Enable{ISA}=0
or COMPlus_EnableHWIntrinsic=0
flags (see
runtime/src/coreclr/src/jit/jitconfigvalues.h
Lines 226 to 272 in fcd862e
CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1) // Enable Base | |
CONFIG_INTEGER(EnableSSE, W("EnableSSE"), 1) // Enable SSE | |
CONFIG_INTEGER(EnableSSE2, W("EnableSSE2"), 1) // Enable SSE2 | |
CONFIG_INTEGER(EnableSSE3, W("EnableSSE3"), 1) // Enable SSE3 | |
CONFIG_INTEGER(EnableSSSE3, W("EnableSSSE3"), 1) // Enable SSSE3 | |
CONFIG_INTEGER(EnableSSE41, W("EnableSSE41"), 1) // Enable SSE41 | |
CONFIG_INTEGER(EnableSSE42, W("EnableSSE42"), 1) // Enable SSE42 | |
CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 1) // Enable AVX | |
CONFIG_INTEGER(EnableAVX2, W("EnableAVX2"), 1) // Enable AVX2 | |
CONFIG_INTEGER(EnableFMA, W("EnableFMA"), 1) // Enable FMA | |
CONFIG_INTEGER(EnableAES, W("EnableAES"), 1) // Enable AES | |
CONFIG_INTEGER(EnableBMI1, W("EnableBMI1"), 1) // Enable BMI1 | |
CONFIG_INTEGER(EnableBMI2, W("EnableBMI2"), 1) // Enable BMI2 | |
CONFIG_INTEGER(EnableLZCNT, W("EnableLZCNT"), 1) // Enable AES | |
CONFIG_INTEGER(EnablePCLMULQDQ, W("EnablePCLMULQDQ"), 1) // Enable PCLMULQDQ | |
CONFIG_INTEGER(EnablePOPCNT, W("EnablePOPCNT"), 1) // Enable POPCNT | |
#else // !defined(TARGET_AMD64) && !defined(TARGET_X86) | |
// Enable AVX instruction set for wide operations as default | |
CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 0) | |
#endif // !defined(TARGET_AMD64) && !defined(TARGET_X86) | |
// clang-format off | |
#if defined(TARGET_ARM64) | |
CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1) | |
CONFIG_INTEGER(EnableArm64Aes, W("EnableArm64Aes"), 1) | |
CONFIG_INTEGER(EnableArm64Atomics, W("EnableArm64Atomics"), 1) | |
CONFIG_INTEGER(EnableArm64Crc32, W("EnableArm64Crc32"), 1) | |
CONFIG_INTEGER(EnableArm64Dcpop, W("EnableArm64Dcpop"), 1) | |
CONFIG_INTEGER(EnableArm64Dp, W("EnableArm64Dp"), 1) | |
CONFIG_INTEGER(EnableArm64Fcma, W("EnableArm64Fcma"), 1) | |
CONFIG_INTEGER(EnableArm64Fp, W("EnableArm64Fp"), 1) | |
CONFIG_INTEGER(EnableArm64Fp16, W("EnableArm64Fp16"), 1) | |
CONFIG_INTEGER(EnableArm64Jscvt, W("EnableArm64Jscvt"), 1) | |
CONFIG_INTEGER(EnableArm64Lrcpc, W("EnableArm64Lrcpc"), 1) | |
CONFIG_INTEGER(EnableArm64Pmull, W("EnableArm64Pmull"), 1) | |
CONFIG_INTEGER(EnableArm64Sha1, W("EnableArm64Sha1"), 1) | |
CONFIG_INTEGER(EnableArm64Sha256, W("EnableArm64Sha256"), 1) | |
CONFIG_INTEGER(EnableArm64Sha512, W("EnableArm64Sha512"), 1) | |
CONFIG_INTEGER(EnableArm64Sha3, W("EnableArm64Sha3"), 1) | |
CONFIG_INTEGER(EnableArm64AdvSimd, W("EnableArm64AdvSimd"), 1) | |
CONFIG_INTEGER(EnableArm64AdvSimd_v81, W("EnableArm64AdvSimd_v81"), 1) | |
CONFIG_INTEGER(EnableArm64AdvSimd_Fp16, W("EnableArm64AdvSimd_Fp16"), 1) | |
CONFIG_INTEGER(EnableArm64Sm3, W("EnableArm64Sm3"), 1) | |
CONFIG_INTEGER(EnableArm64Sm4, W("EnableArm64Sm4"), 1) | |
CONFIG_INTEGER(EnableArm64Sve, W("EnableArm64Sve"), 1) | |
#endif // defined(TARGET_ARM64) |
I don't believe we have a flag for
Vector<T>
today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct; that's what FEATURE_SIMD
was generally used for, and I was just indicating that we need a way to disable Vector<T>
without impacting the calling convention, and that doing so will probably require additional changes.
@@ -1550,6 +1550,29 @@ void ArgIteratorTemplate<ARGITERATOR_BASE>::ComputeReturnFlags() | |||
} | |||
#endif | |||
|
|||
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI) | |
#if defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI) |
Recent rename
- The calculation of offset was based on ENREGISTERED_RETURNTYPE_MAXSIZE, but not handled symbolically
38b2a67
to
f4f73f2
Compare
} | ||
else if ((norm_struct_type == TYP_SIMD32) && (vectorRegSizeForReturn == 32)) | ||
{ | ||
// TYP_SIMD16 should be returned in YMM0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: TYP_SIMD32 (This is one more occurrence of this below)
// Auto-generated message 69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts. To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others. |
Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes. |
No description provided.