Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPlus_EnableHWIntrinsic=0 no longer disables SSE+ #35605

Closed
tannergooding opened this issue Apr 29, 2020 · 16 comments
Closed

COMPlus_EnableHWIntrinsic=0 no longer disables SSE+ #35605

tannergooding opened this issue Apr 29, 2020 · 16 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Apr 29, 2020

In .NET Core 3.0/3.1 setting COMPlus_EnableHWIntrinsic=0 would disable all HWIntrinsics. That is, it would mark the shared helper intrinsics (Vector64, Vector128, and Vector256) as unsupported and would also cause any platform specific intrinsics (SSE+ on x86) to be unsupported.
However, in the current master branch, setting this only disables the shared helper intrinsics. We should likely clean this up so that the previous behavior stays the same.

That being said, it might be beneficial to change how it was functioning as compared to .NET Core 3.1 (I had logged a bug for this a while back: #11701). That is, rather than having EnableHWIntrinsic also having the compiler report that the compiler doesn't support SSE+, we should instead just have it only impact the creation of HWIntrinsic nodes. We could do this via a similar mechanism to FeatureSIMD which currently has a bool featureSIMD field and which uses that to early exit from the impSIMDIntrinsic code paths and we could additionally add it as an assert to gtNewSimdHWIntrinsic and gtNewScalarHWIntrinsic methods.

This would allow the compiler to continue reporting and keying off of what ISAs the hardware supports regardless of whether the user has HWIntrinsics enabled/disabled.

category:testing
theme:intrinsics
skill-level:intermediate
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 29, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. Please help me learn by adding exactly one area label.

@tannergooding
Copy link
Member Author

CC. @davidwrighton, @CarolEidt, @echesakovMSFT

@tannergooding
Copy link
Member Author

If everyone is fine with the suggested approach, I'm happy to put up a PR.

To reiterate, the suggested approach is:

  1. Add a new featureHWIntrinsic field which parities the featureSIMD field
  2. Key off the field for an early exit in the relevant code paths, such as impHWIntrinsic
  3. Add an assert in gtNewSIMDHWIntrinsic and gtNewScalarHWIntrinsic that featureHWIntrinsic is disabled, to help catch other code paths that are creating HWIntrinsic nodes.

@CarolEidt CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 29, 2020
@CarolEidt
Copy link
Contributor

The approach seems reasonable, though with #35421 it would be good to understand (and document) how the two features (SIMD and HWIntrinsic) cooperate.

@tannergooding
Copy link
Member Author

though with #35421 it would be good to understand (and document) how the two features (SIMD and HWIntrinsic) cooperate.

This might require having a clear understanding of what all featureSIMD covers. From what I can tell, featureSIMD was originally meant to cover just the SIMD intrinsics and supportSIMDTypes() was meant to indicate the TYP_SIMD support in general (both being under the general FEATURE_SIMD define) and on ARM64 the former can be disabled while the latter always returns true, as it is required for HFA/HVA types in the ABI. However, on x64, the latter is based on the former even though Unix x64 has similar HFA/HVA requirements. We likewise have various support (like save/restore upper) which is directly tied to GT_SIMD and so it has to be coupled today.

What I would, personally, like to see is that:

  • supportsSIMDTypes() is always true on platforms that support TYP_SIMD and it can't be disabled (ARM is the only existing platform that would return false).
  • EnableHWIntrinsic impacts the creation and use of GT_HWINTRINSIC anywhere in the runtime
  • FeatureSIMD (and FEATURE_SIMD) would no longer be necessary once all the work related Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421 is finished, so it could be removed
  • Enable{ISA} impacts codegen for the JIT in general (how it works today, modulo the "baseline" ISAs where it only impacts HWIntrinsics)
  • Introduce a new officially supported flag that allows the size of Vector<T> to be controlled

For FeatureSIMD, we could keep it around and have it control whether impSimdAsHWIntrinsic exits early, but I can't think any cases where this would be particularly useful outside the existing EnableHWIntrinsic and FeatureSIMD switches.

For Enable{ISA}, this would largely impact HWIntrinsics but it would also be impactful to things like the use of the VEX encoding on x86/x64 (and therefore TYP_SIMD32 support, since you can't support YMM if AVX isn't also supported) and other optimizations that happen (although many of these are being driven through HWIntrinsics, sometimes via internal only APIs). For "baseline" ISAs (SSE/SSE2 on x86/x64 and AdvSIMD on ARM64) it only impacts the creation of HWIntrinsic nodes for the relevant ISA.

For the flag controlling the size of Vector<T>, devs can indirectly control the size via EnableAVX2=0 on x86/x64 today (since we can't support integer types with just AVX). However, it's non-ideal that you have to disable AVX2 just to change the size of Vector<T> and this won't work if we ever introduce AVX-512 support where having it be the default is likely not beneficial....

@BruceForstall BruceForstall added this to the 5.0 milestone Apr 29, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label Apr 29, 2020
@davidwrighton
Copy link
Member

I believe we have way too many knobs here, and these flags are far to tied to jit internal behaviors, and don't work well in the presence of AOT generated code. Do we have any evidence for what combinations of flags are actually used in practice? How do we intend for these flags to be used? And if we intend for customers to use them not just runtime developers, these flags need explicit testing.

@tannergooding
Copy link
Member Author

don't work well in the presence of AOT generated code

None of the JIT configuration knobs really work "well" with AOT generated code as any code which has been AOT'd will completely ignore them until it is rejitted.

Do we have any evidence for what combinations of flags are actually used in practice?

I don't know if we have any metrics besides our own testing + ML.NET

How do we intend for these flags to be used?

The EnableHWIntrinsic + Enable{ISA} flags are exposed to support devs testing all the various paths their code supports. That is, if you have the following, as you might for a DotProduct implementation:

if (Sse41.IsSupported)
{
}
else if (Sse3.IsSupported)
{
}
else if (Sse.IsSupported)
{
}
else
{
}

You can define a CI job with each lower ISA disabled and ultimately have tested all configurations and the software fallback as we do in our outerloop and as ML.NET does, etc.

And if we intend for customers to use them not just runtime developers, these flags need explicit testing.

Do we have a proposal for how to test this? The runtime in general isn't very unit testable and we are generally stuck with inserting some asserts that things "look correct".
So it isn't very easy to say that if EnableSSE3=0 is set we shouldn't see any SSE3 instructions generated.

@davidwrighton
Copy link
Member

I expect that testing could work based on looking at the results of the IsSupported flags values, and making sure that the appropriate IsSupported managed apis were enabled/disabled, the IsHardwareAccelerate api would return true/false. Effectively, I would like a set of tests that would cover the IsSupported flags, and ensure that they behave as expected. Actual generation of native instructions is less of a concern from my POV.

In addition, eventually I'm going to get us to support the native calling conventions for the vector types. At that point, the exact set of supported instruction sets will become an issue that not only effects codegen, but also has implications for the runtime, just like the current logic on Vector<T>. Do we like the way we represent that data? Does it make sense? etc.

@danmoseley
Copy link
Member

Whatever approach, unit testing needs the environment variables to force execution down each path. Currently we don't test all paths - and as we add more ARM paths, the software path will be less tested as well. There is an issue to track doing this better: #950

@CarolEidt
Copy link
Contributor

I believe we have way too many knobs here.

This is pretty fundamental to the way we have exposed the ISAs for hardware intrinsics. Since we don't explicitly test the full range of actual hardware supported, we use these options as an alternative, though note that we've had encoding and other issues in the past that slipped by our tests.

We have a set of outerloop tests that run with the full complement of ISA options - though they don't combine those with the range of other outerloop testing we do (e.g. GC stress, jitStressRegs, etc.)

@davidwrighton
Copy link
Member

Ok, let me revise my statement. I believe we have way too many knobs that are easy to misunderstand the consequences of adjusting.

There are the knobs attached to instruction set, which are now handled consistently in the JIT, VM, and AOT compiler such that if there are disagreements amongst the various components the JIT runtime behavior will win out if the method is jitted. (It will be ignored for methods that have been precompiled via AOT until tiered compilation kicks in). If we move the consequences of the Enable{Isa} switches to be visible at the runtime level as well as the JIT it will work with crossgen2 to produce a consistent meaning.

There is the feature simd stuff, which I don't believe actually works in a consistent fashion today, which I thought was primarily designed to disable the Vector128/256 intrinsics.

And finally there is the EnableHWIntrinsic switch which is the focus of this discussion. As its described in terms of GenTree nodes, I have no idea what its supposed to mean as I'm not familiar with what it means to make one type of GenTree vs another.

If we add a supported switch for adjusting the Vector<T> size, we need to consider the AOT scenario, and determine what change to the R2R file format need to be made to support such a switch existing at runtime.

@tannergooding
Copy link
Member Author

And finally there is the EnableHWIntrinsic switch which is the focus of this discussion. As its described in terms of GenTree nodes, I have no idea what its supposed to mean as I'm not familiar with what it means to make one type of GenTree vs another.

EnableHWIntrinsics is meant to disable all HWIntrinsics and guarantee the software fallback is covered. It provides a simpler approach than needing to remember all baseline ISAs that exist for a given architecture or range of architectures.
If we aren't generating any HWIntrinsics, then we should never be creating a GT_HWINTRINSIC node and hence having an assert in the constructor would help catch violations of this.

@davidwrighton
Copy link
Member

I wonder if we couldn't tweak this to be all expressed in terms of access to the various InstructionSets, and then it becomes somewhat simpler to model across the broad spectrum of the system. For instance, EnableHWIntrinsics could disable all of the various instruction sets, and then make the VM aware those bits are disabled, etc. Vector<T> size control could (on X86 and X64 platforms control whether or not Avx2 was enabled.)

Then we could have one model of communication across the system, the instruction sets, and yet still have the somewhat simplified controls of EnableHWIntrinsics and VectorT size control.

@tannergooding
Copy link
Member Author

For instance, EnableHWIntrinsics could disable all of the various instruction sets

That's what we had in 3.0; however this means disabling HWIntrinsics impacts the size of Vector<T> and support for the VEX encoding; which isn't necessarily desirable.

@davidwrighton
Copy link
Member

Ah, makes sense. Well, if its necessary, its necessary. What I ask though, is that we have to consider the consequences of AOT generated code. As of my somewhat recent change, the compiler can and will generate logic which depends on these flags, and we need to build the appropriate mechanisms to disallow usage of semantically wrong code.

@tannergooding
Copy link
Member Author

This was fixed back in #37882 where the SimdAsHWIntrinsic logic was updated to additionally check EnableHWIntrinsic (the regular HWIntrinsic logic was already doing the same).

@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

6 participants