-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow specifying hardware support level for crossgen2 #226
Comments
Crossgen2 Instruction Set SupportThis covers specification of the baseline instruction set for use in compiles via Crossgen2. Instruction setsInstruction sets in crossgen2 are currently handled on a minimal baseline basis, and use of intrinsics which require instructions beyond the baseline set cause the compile to not generate code and instead rely on the JIT. The intention with this work is to specify a means by which a developer can specify a new baseline to the compiler, and achieve correctness. Problems for version resilient code generation with intrinsics and vector instructions
New Command line arguments to support changed instruction set baselines
This command line option may be specified multiple times on the command line. If there are multiple specifications for the same instruction set name, then the behavior specified rightmost on the command line shall take precedence. If no qualifier is provided for the instruction set, e.g Exception to the rules aboveOn the X86 and X64 platforms, the TODO: What is the baseline support on Effect of instruction set specification for crossgen2 in .NET 5.0
Supported instruction sets for crossgen2 in .NET 5.0The naming of these instruction sets is based on the type names in the System.Runtime.Intrinsics namespace.
R2R format changes and runtime support to support changed instruction set baselines
It is expected that crossgen2 updated baseline scenarios will add an eager fixups section that specifies the layout of Most common expected use cases
Plausible future work
|
@jkotas, does this sound like a reasonable plan? |
Worth noting a few instances of It might be worth elaborating why
Will this be sufficient for all combinations in the future? AVX-512 itself has, what looks to be, 20 ISAs: https://en.wikipedia.org/wiki/AVX-512#CPUs_with_AVX-512 |
R2R gracefully fallbacks to JIT when the R2R payload is unusable. We used to throw exceptions on various mismatches in the past, but people found it very inconvenient. I believe we should fallback gracefully in this case as well. We have tracing and other mechanisms to allow diagnosing the mismatches.
Would it make sense to encode this as fixup? The fixups are designed to encode image or method body prerequisites among other things. For example,
Nit:
An easier variant of multiversioning may be encoding method prerequisites. We know that majority of machines out there do support many of the intrinsics. How hard would it be to generate the method code with assumption that it runs on recent machine, record the assumptions made by the method if there are any, and then fallback to JIT only when the assumptions are not met? |
@tannergooding @jkotas I've updated my idea on specification of instruction sets. Its now quite a bit more flexible. Also, I've removed the concept of the @tannergooding Could you tell me if my set of implied instruction sets is correct. Of particular interest are the @tannergooding, do we have baseline support on Arm64 for any of the instruction sets (such as @tannergooding, is |
This section exists already. It is encoded via |
@jkotas, fantastic. I've updated the spec to discuss CORCOMPILE_IMPORT_FLAGS_EAGER. I've added a tweak to that behavior to not throw BadImageFormatException in cases of a Check failure, but otherwise I think I can use the existing functionality. |
Looks correct and to match the inheritance heirarchy (modulo Bmi1/2 which have the associated explanation).
I'm not aware what hardware we have declared as baselien for ARM64.
Arm32 (and prior) currently has no intrinsic support and will not for 5.0. I'm unaware if we have any plans to make it happen in the future. |
Adding @dotnet/crossgen-contrib for comments. My goal is to implement something as close as possible to my first comment. |
- Add support for the --instruction-set parameter as described in #226 . NOTE: As the abi for Vector parameters is not yet stable, support for the --instruction-set parameter is only enabled if --inputbubble is also enabled. Parallel work to stabilize the abi is in progress, but is not complete. ALSO NOTE: The names of the instruction sets are shared with mono, and don't follow the names in issue #226 - Add concept of baseline instruction set support to R2R file format - Can be applied at a per method level or at the entire R2R file level - R2RDump support for dumping the extra data - Refactor how support for hardware intrinsics beyond SSE2 support are handled in crossgen2 - Add feature to the JIT to detect which hardware features are actually used - Tell the JIT unconditionally that SSE42+Lzcnt+Popcnt+Pclmulqdq are supported - But if support beyond the --instruction-set specified baseline is used, notate the method with a per-method instruction set support fixup. - This enables usage of many intrinsics in corelib with greater efficiency than today - This enables usage of SSE42 and below intrinsics safely in non-CoreLib code. Use of higher level intrinsics in non CoreLib code will generate code which does not use the higher level intrinsic, and note that the method's code should not be used in the presence of hardware which does support greater CPU capabilities. - In the future a logical enhancement of this work would be to generate multiple bodies of code to handle these more complex cases. - In combination with the --instruction-set argument, if Avx2 is enabled, then the logic gracefully adds a dependency on Avx2 capability and Vector<T> becomes useable by crossgen'd code.
What would the developer interface for this look like? In C++ a typical method for processor targeting would be to implement private methods of a class across in several files (translation units) so that different /arch settings can be used. A .csproj build setting for a minimum instruction set seems to make sense---this already exists in a way, since an x64 platform target implies SSE2---but it's less clear to me how more granular instruction set targeting might be accomplished. For example, if I could have a project set to an AVX platform target I'd still be looking for something like [MethodImpl(InstructionSet = AVX2)] on certain methods where the AVX instruction set is limiting. Another case I commonly encounter is register spilling due to 16 ymms being inadequate. So, even in methods keeping to 128 bit SIMD to avoid knocking the core out of turbo, current dissasemblies suggest substantial performance improvement would be possible if inlining decisions could take advantage of 32 zmms. EVEX access from C# isn't currently much of a concern for me but, with increasing AVX-512 availability from Intel Ice Lake shipments, I expect this to be changing by the end of 2020. I'm also curious of the extent to which the CPU dispatching required of .NET Core implementations might be optimized out. For example, I've ended up "reimplementing" certain System.Runtime.Intrinsics paths for AVX targets by dropping unnecessary branches. This isn't a big deal but it does carry some cost. |
@twest820 Actually, crossgen2 support has been checked in to the tool itself, and is functional now although the benefits are currently quite modest (See the --instruction-set command line argument). The current expectation is that the control will be at a module level of granularity for now, but sometime after crossgen2 replaces crossgen and intrinsics use becomes more common in the community, we will likely explore adding more granular controls. This issue remains open as the developer interface (which will be some csproj property) has not been designed and implemented. |
Hi David, thanks for the update. My experience of /arch:AVX and /arch:AVX2 is quite modest as well. Also within my experience, it's common compute intensive applications don't need to support 10+ or 7+ year old processors. So the return on the developer time for changing the build target remains excellent. Is there anticipated timeframe for the developer interface? I'm kind of hearing after .NET 5. |
We're likely to ship the feature as opt in for x64 in .NET 5, in a use at your own risk manner. We expect it to function correctly in all cases, but performance will not have been tuned. |
We've enabled the ability to use crossgen2 in the sdk for .NET 5 via the <PublishReadyToRunCrossgen2ExtraArgs>--instruction-set:avx2,bmi2,fma,pclmul,popcnt,aes</PublishReadyToRunCrossgen2ExtraArgs> Note: These property values will be respected by the 5.0 sdk, but as support will be in preview, it is possible for the properties to change in future releases. |
@davidwrighton Thanks for the update. Seems reasonable that it will work and may change in 6.0 LTS but at least there is an escape hatch for 5.0 |
Closing since crossgen2 now supports specifying instruction-sets. |
Goals
To make ahead of time compiled binary widely applicable, it cannot make assumptions on the execution environment. For example, it cannot assume the processor used to run the program support
AVX
instructions.This is unfortunate because ahead of time compilation is meant for performance, but it cannot be as performant as it could be, just because of the lack of information.
The change proposed in this issue is to remedy it - the problem is the lack of information - so we ask the user to supply it.
The abstract, high-level functional requirement:
The key challenges:
What exactly do we mean by the hardware support level?
This is tricky, different processors support different subsets of instructions. These subsets are not totally-ordered. The only way to specify the exact hardware support is to specify the subset.
What if the hardware is more capable than the assumed level?
Ideally, the hardware support is exactly as assumed. If it is less capable, we can bail out and refuse to load the assembly. But it is more capable, then refusing to load the assembly seems harsh. If we do use the ready-to-run code, there could be problems as follow:
Suppose we jit this, and ready-to-run compile
ready_to_run_use_x
assumingx
is not supported.ready_to_run_use_x
would throwPlatformNotSupportedException
at runtime, not what we wanted.Vector<T>
size:Suppose we run this:
If they do not agree on
Vector<T>
sizes, the call will not work.This is a general problem - if we change calling convention - then the call will not work. In general, calling convention is something we should just never change, but it appears to me that we will - due to this:
https://github.com/dotnet/coreclr/issues/15943
AVX
is likely to be much less useful if we cannot useVector<T>
.Design
The solution to (1) is TBD, it will be a verbose and extensible format that describes the instruction subset.
The solution to (2) is that we specify a fixed
Vector<T>
size and also enforce the same size at runtime when JIT asks for it.The solution to (3) is TBD - ideally, it is fixed so we can use
Vector<T>
in crossgen2.If we zoom out a little bit - we notice the general problem is disagreement. The past approach for ready-to-run is to solve the disagreement by shutting up (i.e. not compiling). Here I am proposing something different, I am saying we should solve the disagreement by letting the JIT follows the assumption (i.e
Vector<T>
size).Currently,
Vector<T>
size is the only thing I want to enforce the JIT to follow. In particular, suppose we implementedAVX512
, I don't want to stop the JIT from using it, meaning if there is any code that usedAVX512
,crossgen2
will refuse to compile it, just like it was.Audience
The key customer of this feature are:
This is likely to be rare in terms of the number of people. But if we could make it on the cloud, that could potentially benefit many people automatically.
The text was updated successfully, but these errors were encountered: