[WIP] Windows return simd #32278

davidwrighton · 2020-02-14T02:25:58Z

No description provided.

jkotas · 2020-02-14T04:47:08Z

src/coreclr/src/vm/callingconvention.h

+                if (jitMgr->LoadJIT())
+                {
+                    CORJIT_FLAGS cpuCompileFlags = jitMgr->GetCPUCompileFlags();
+                    if (cpuCompileFlags.IsSet(CORJIT_FLAGS::CORJIT_FLAG_FEATURE_SIMD))


It does not feel right to vary calling convention based on these flags. Would it be better for CORJIT_FLAG_FEATURE_SIMD to only control whether hardware intrinsics are enable, but not alter calling convention?

IIRC, we don't have support for any of the TYP_SIMD stuff without this flag. So there might be a bit more involved in recognizing the type and getting things to work in that scenario.

Should this flag be unconditionally set on Windows x64 then?

I believe that is currently the case for the mainline build:

runtime/src/coreclr/src/jit/CMakeLists.txt

Lines 13 to 16 in 5040afe

if (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND NOT CLR_CMAKE_HOST_UNIX))

add_definitions(-DFEATURE_SIMD)

add_definitions(-DFEATURE_HW_INTRINSICS)

endif ()

@CarolEidt might know if there are any cases where it is or could be disabled (at least from a quick grep, it is disabled for armelnonjit, linuxnonjit, and protononjit).

I added this weird quirk for the no simd testing scenario, but as you say, its not really supported or generally tested. Would there be objection to removing the ability to turn off the ability to disable feature simd on Windows X64?

It is unlikely we will ever support an X86 platform where we don't have basic SSE support, as such I think disabling SIMD support doesn't make much sense except for esoteric test scenarios that it turns out we don't use much anyways. A much more interesting question is for ARM64 on Linux and ARM32.

Arm32 doesn't support simd today, and thus an abi shift based on FEATURE_SIMD support will be necessary for when we add support for Vector<T>, as the abi of Vector<T> is fundamentally based on using a vector register. However, I think it is reasonable for the Vector abi to vary based on weird switches, as it really is supposed to be dynamic.

For Arm64, we support AdvSimd and floating point in the baseline, and according to the comment in pal\src\misc\jitsupport.cpp, we don't support platforms without that support, but we do actually have the checks that call into the kernel to ask if such support exists. It would be quite reasonable for a Vector<T> to use SVE in the future, and using SVE might imply a different calling convention. That adjusted calling convention would be problematic, as the architectural max size of a SVE register is 2048 bits, and would thus bloat our transition block structure enormously.

As far as "supported calling convention on a platform", well, the SysV Intel abi on Linux/Mac is a bit peculiar around this. It allows for use of the registers as long as all compiled is compiled as machine aware, but the struct types work regardless. For instance, if you compile a library as Avx aware in clang on Linux, and expose an abi that manipulates an __m256, and call that function from another binary that was not compiled as Avx aware but does use the __m256 structure, then the same C++ compiler will generate conflicting abis, and things will break. Exposing interop support, will cause us to need to be able to talk to such issues, and reasons like that are why this work will not directly enable vector marshalling support.

For Windows, matters are a bit simpler. On Windows the C++ abi does not vary by compiler switch. The compiler switches will however, drive if a __m256 type is useable at all.

Arm32/64 abi requires use of SIMD registers for the SIMD types. Vector128<T> should only be passed in SIMD registers, etc.

no simd testing scenario

We should have a way to disable hardware accelerated Vector and the hardware intrinsics IsSupported methods so that people can test their fallback paths. But the calling convention can stay the same, even when these are disabled.

I agree with @jkotas here - we need to have a way to test fallback paths (though I would note that this should really only impact the ability to test those fallback paths on a platform that would otherwise not require them), and it may be that changing the behavior of FEATURE_SIMD in this way will require additional changes, but if so those changes should be made.

We should have a way to disable hardware accelerated Vector and the hardware intrinsics IsSupported methods so that people can test their fallback paths.

In the case of HWIntrinsics, this is done via the COMPlus_Enable{ISA}=0 or COMPlus_EnableHWIntrinsic=0 flags (see

runtime/src/coreclr/src/jit/jitconfigvalues.h

Lines 226 to 272 in fcd862e

CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1) // Enable Base

CONFIG_INTEGER(EnableSSE, W("EnableSSE"), 1) // Enable SSE

CONFIG_INTEGER(EnableSSE2, W("EnableSSE2"), 1) // Enable SSE2

CONFIG_INTEGER(EnableSSE3, W("EnableSSE3"), 1) // Enable SSE3

CONFIG_INTEGER(EnableSSSE3, W("EnableSSSE3"), 1) // Enable SSSE3

CONFIG_INTEGER(EnableSSE41, W("EnableSSE41"), 1) // Enable SSE41

CONFIG_INTEGER(EnableSSE42, W("EnableSSE42"), 1) // Enable SSE42

CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 1) // Enable AVX

CONFIG_INTEGER(EnableAVX2, W("EnableAVX2"), 1) // Enable AVX2

CONFIG_INTEGER(EnableFMA, W("EnableFMA"), 1) // Enable FMA

CONFIG_INTEGER(EnableAES, W("EnableAES"), 1) // Enable AES

CONFIG_INTEGER(EnableBMI1, W("EnableBMI1"), 1) // Enable BMI1

CONFIG_INTEGER(EnableBMI2, W("EnableBMI2"), 1) // Enable BMI2

CONFIG_INTEGER(EnableLZCNT, W("EnableLZCNT"), 1) // Enable AES

CONFIG_INTEGER(EnablePCLMULQDQ, W("EnablePCLMULQDQ"), 1) // Enable PCLMULQDQ

CONFIG_INTEGER(EnablePOPCNT, W("EnablePOPCNT"), 1) // Enable POPCNT

#else // !defined(TARGET_AMD64) && !defined(TARGET_X86)

// Enable AVX instruction set for wide operations as default

CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 0)

#endif // !defined(TARGET_AMD64) && !defined(TARGET_X86)

// clang-format off

#if defined(TARGET_ARM64)

CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1)

CONFIG_INTEGER(EnableArm64Aes, W("EnableArm64Aes"), 1)

CONFIG_INTEGER(EnableArm64Atomics, W("EnableArm64Atomics"), 1)

CONFIG_INTEGER(EnableArm64Crc32, W("EnableArm64Crc32"), 1)

CONFIG_INTEGER(EnableArm64Dcpop, W("EnableArm64Dcpop"), 1)

CONFIG_INTEGER(EnableArm64Dp, W("EnableArm64Dp"), 1)

CONFIG_INTEGER(EnableArm64Fcma, W("EnableArm64Fcma"), 1)

CONFIG_INTEGER(EnableArm64Fp, W("EnableArm64Fp"), 1)

CONFIG_INTEGER(EnableArm64Fp16, W("EnableArm64Fp16"), 1)

CONFIG_INTEGER(EnableArm64Jscvt, W("EnableArm64Jscvt"), 1)

CONFIG_INTEGER(EnableArm64Lrcpc, W("EnableArm64Lrcpc"), 1)

CONFIG_INTEGER(EnableArm64Pmull, W("EnableArm64Pmull"), 1)

CONFIG_INTEGER(EnableArm64Sha1, W("EnableArm64Sha1"), 1)

CONFIG_INTEGER(EnableArm64Sha256, W("EnableArm64Sha256"), 1)

CONFIG_INTEGER(EnableArm64Sha512, W("EnableArm64Sha512"), 1)

CONFIG_INTEGER(EnableArm64Sha3, W("EnableArm64Sha3"), 1)

CONFIG_INTEGER(EnableArm64AdvSimd, W("EnableArm64AdvSimd"), 1)

CONFIG_INTEGER(EnableArm64AdvSimd_v81, W("EnableArm64AdvSimd_v81"), 1)

CONFIG_INTEGER(EnableArm64AdvSimd_Fp16, W("EnableArm64AdvSimd_Fp16"), 1)

CONFIG_INTEGER(EnableArm64Sm3, W("EnableArm64Sm3"), 1)

CONFIG_INTEGER(EnableArm64Sm4, W("EnableArm64Sm4"), 1)

CONFIG_INTEGER(EnableArm64Sve, W("EnableArm64Sve"), 1)

#endif // defined(TARGET_ARM64)

).
I don't believe we have a flag for Vector<T> today.

Correct; that's what FEATURE_SIMD was generally used for, and I was just indicating that we need a way to disable Vector<T> without impacting the calling convention, and that doing so will probably require additional changes.

jkotas · 2020-02-14T05:22:03Z

src/coreclr/src/vm/callingconvention.h

@@ -1550,6 +1550,29 @@ void ArgIteratorTemplate<ARGITERATOR_BASE>::ComputeReturnFlags()
            }
 #endif

+#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI)


Suggested change

#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI)

#if defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI)

Recent rename

- The calculation of offset was based on ENREGISTERED_RETURNTYPE_MAXSIZE, but not handled symbolically

…eturn

cshung · 2020-05-11T16:28:00Z

src/coreclr/src/jit/compiler.cpp

+    }
+    else if ((norm_struct_type == TYP_SIMD32) && (vectorRegSizeForReturn == 32))
+    {
+        // TYP_SIMD16 should be returned in YMM0


Typo: TYP_SIMD32 (This is one more occurrence of this below)

ViktorHofer · 2020-12-08T14:53:40Z

// Auto-generated message

69e114c which was merged 12/7 removed the intermediate src/coreclr/src/ folder. This PR needs to be updated as it touches files in that directory which causes conflicts.

To update your commits you can use this bash script: https://gist.github.com/ViktorHofer/6d24f62abdcddb518b4966ead5ef3783. Feel free to use the comment section of the gist to improve the script for others.

ghost · 2021-01-25T13:02:31Z

Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes.

jkotas reviewed Feb 14, 2020

View reviewed changes

danmoseley added the area-VM-coreclr label Feb 19, 2020

cshung and others added 13 commits February 27, 2020 18:43

__m128 should be returned in XMM0

588162b

Avoid changing the behavior of platforms other than just Windows

9bbbf43

Fixing missed case

a58265c

Apply formatting patch

19211e8

Fix the reflection invoke scenario

c141e1f

Bug fixes

5bac886

Limit call convention changes to Vector128<> only

64343f0

Fix formatting

f4b13a9

Fix offsets and allocated area in GenericComPlusCallStub

0b59d6a

- The calculation of offset was based on ENREGISTERED_RETURNTYPE_MAXSIZE, but not handled symbolically

Add changes for Vector256<T>

80aa1d8

Fix typo

549d723

Handle more cases where YMM registers were not handled correctly on r…

9426c6c

…eturn

Fix build breaks

f4f73f2

davidwrighton force-pushed the windows_return_simd branch from 38b2a67 to f4f73f2 Compare February 28, 2020 02:44

jkotas mentioned this pull request Apr 29, 2020

[WIP] __m128 should be returned in XMM0 #1230

Closed

4 tasks

cshung reviewed May 11, 2020

View reviewed changes

tannergooding mentioned this pull request May 15, 2020

Next round of multireg preliminary changes #36155

Merged

joshbartley mentioned this pull request Aug 5, 2020

.Skip allows a Collection to be modified during enumeration #40415

Closed

tannergooding mentioned this pull request Sep 1, 2020

Survey: Interop #40484

Closed

ghost closed this Jan 25, 2021

ghost locked as resolved and limited conversation to collaborators Feb 24, 2021

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Windows return simd #32278

[WIP] Windows return simd #32278

davidwrighton commented Feb 14, 2020

jkotas Feb 14, 2020

tannergooding Feb 14, 2020

jkotas Feb 14, 2020

tannergooding Feb 14, 2020

davidwrighton Feb 14, 2020

davidwrighton Feb 14, 2020

jkotas Feb 16, 2020

CarolEidt Feb 18, 2020

tannergooding Feb 18, 2020

CarolEidt Feb 18, 2020

jkotas Feb 14, 2020

cshung May 11, 2020 •

edited

Loading

ViktorHofer commented Dec 8, 2020

ghost commented Jan 25, 2021

	if (CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_ARM64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND NOT CLR_CMAKE_HOST_UNIX))
	add_definitions(-DFEATURE_SIMD)
	add_definitions(-DFEATURE_HW_INTRINSICS)
	endif ()

	CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1) // Enable Base
	CONFIG_INTEGER(EnableSSE, W("EnableSSE"), 1) // Enable SSE
	CONFIG_INTEGER(EnableSSE2, W("EnableSSE2"), 1) // Enable SSE2
	CONFIG_INTEGER(EnableSSE3, W("EnableSSE3"), 1) // Enable SSE3
	CONFIG_INTEGER(EnableSSSE3, W("EnableSSSE3"), 1) // Enable SSSE3
	CONFIG_INTEGER(EnableSSE41, W("EnableSSE41"), 1) // Enable SSE41
	CONFIG_INTEGER(EnableSSE42, W("EnableSSE42"), 1) // Enable SSE42
	CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 1) // Enable AVX
	CONFIG_INTEGER(EnableAVX2, W("EnableAVX2"), 1) // Enable AVX2
	CONFIG_INTEGER(EnableFMA, W("EnableFMA"), 1) // Enable FMA
	CONFIG_INTEGER(EnableAES, W("EnableAES"), 1) // Enable AES
	CONFIG_INTEGER(EnableBMI1, W("EnableBMI1"), 1) // Enable BMI1
	CONFIG_INTEGER(EnableBMI2, W("EnableBMI2"), 1) // Enable BMI2
	CONFIG_INTEGER(EnableLZCNT, W("EnableLZCNT"), 1) // Enable AES
	CONFIG_INTEGER(EnablePCLMULQDQ, W("EnablePCLMULQDQ"), 1) // Enable PCLMULQDQ
	CONFIG_INTEGER(EnablePOPCNT, W("EnablePOPCNT"), 1) // Enable POPCNT
	#else // !defined(TARGET_AMD64) && !defined(TARGET_X86)
	// Enable AVX instruction set for wide operations as default
	CONFIG_INTEGER(EnableAVX, W("EnableAVX"), 0)
	#endif // !defined(TARGET_AMD64) && !defined(TARGET_X86)

	// clang-format off

	#if defined(TARGET_ARM64)
	CONFIG_INTEGER(EnableHWIntrinsic, W("EnableHWIntrinsic"), 1)
	CONFIG_INTEGER(EnableArm64Aes, W("EnableArm64Aes"), 1)
	CONFIG_INTEGER(EnableArm64Atomics, W("EnableArm64Atomics"), 1)
	CONFIG_INTEGER(EnableArm64Crc32, W("EnableArm64Crc32"), 1)
	CONFIG_INTEGER(EnableArm64Dcpop, W("EnableArm64Dcpop"), 1)
	CONFIG_INTEGER(EnableArm64Dp, W("EnableArm64Dp"), 1)
	CONFIG_INTEGER(EnableArm64Fcma, W("EnableArm64Fcma"), 1)
	CONFIG_INTEGER(EnableArm64Fp, W("EnableArm64Fp"), 1)
	CONFIG_INTEGER(EnableArm64Fp16, W("EnableArm64Fp16"), 1)
	CONFIG_INTEGER(EnableArm64Jscvt, W("EnableArm64Jscvt"), 1)
	CONFIG_INTEGER(EnableArm64Lrcpc, W("EnableArm64Lrcpc"), 1)
	CONFIG_INTEGER(EnableArm64Pmull, W("EnableArm64Pmull"), 1)
	CONFIG_INTEGER(EnableArm64Sha1, W("EnableArm64Sha1"), 1)
	CONFIG_INTEGER(EnableArm64Sha256, W("EnableArm64Sha256"), 1)
	CONFIG_INTEGER(EnableArm64Sha512, W("EnableArm64Sha512"), 1)
	CONFIG_INTEGER(EnableArm64Sha3, W("EnableArm64Sha3"), 1)
	CONFIG_INTEGER(EnableArm64AdvSimd, W("EnableArm64AdvSimd"), 1)
	CONFIG_INTEGER(EnableArm64AdvSimd_v81, W("EnableArm64AdvSimd_v81"), 1)
	CONFIG_INTEGER(EnableArm64AdvSimd_Fp16, W("EnableArm64AdvSimd_Fp16"), 1)
	CONFIG_INTEGER(EnableArm64Sm3, W("EnableArm64Sm3"), 1)
	CONFIG_INTEGER(EnableArm64Sm4, W("EnableArm64Sm4"), 1)
	CONFIG_INTEGER(EnableArm64Sve, W("EnableArm64Sve"), 1)
	#endif // defined(TARGET_ARM64)

	#if defined(_TARGET_AMD64_) && !defined(UNIX_AMD64_ABI)
	#if defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI)

[WIP] Windows return simd #32278

[WIP] Windows return simd #32278

Conversation

davidwrighton commented Feb 14, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cshung May 11, 2020 • edited Loading

Choose a reason for hiding this comment

ViktorHofer commented Dec 8, 2020

ghost commented Jan 25, 2021

cshung May 11, 2020 •

edited

Loading