Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intrinsics analyzer and fixes #85481

Merged
merged 21 commits into from
May 16, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
e1f2f17
Implement analyzer for platform intrinsics use in System.Private.CoreLib
davidwrighton Apr 25, 2023
b246bd0
Remove BypassReadyToRunAttribute as it should no longer be needed for…
davidwrighton Apr 26, 2023
35105fb
Document new intrinsic handling behavior and enhance crossgen2 to res…
davidwrighton Apr 27, 2023
59a55e4
Fix build break
davidwrighton Apr 27, 2023
768d1a9
Fix condition noted in code review
davidwrighton Apr 28, 2023
5ac974c
Update analyzer to detect more patterns, and attribute much of System…
davidwrighton May 9, 2023
cdc8b3f
Add pragmas and attributes to the packed span helpers
davidwrighton May 9, 2023
a01fd9d
Hah! It works
davidwrighton May 9, 2023
5573738
Rename attribute and fix docs
davidwrighton May 9, 2023
5134b86
Merge branch 'main' of github.com:dotnet/runtime into IntrinsicsAnaly…
davidwrighton May 9, 2023
506f053
Add analyzer tests and fixup issues from merging with main branch
davidwrighton May 9, 2023
2e33ae4
Disable tests on Mono due to Mono runtime bug
davidwrighton May 9, 2023
8a69798
Handle incorrect namespace for the CompExactlyDependsOnAttribute and …
davidwrighton May 10, 2023
919e0c3
Merge branch 'main' of github.com:dotnet/runtime into IntrinsicsAnaly…
davidwrighton May 15, 2023
4e413bd
With the analyzer, negative checks are problematic, so re-order some …
davidwrighton May 15, 2023
994ff3e
Code review feedback
davidwrighton May 15, 2023
8ed51a1
Update src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorIn…
davidwrighton May 15, 2023
9442ebe
Update src/libraries/System.Private.CoreLib/tests/IntrinsicsInSystemP…
davidwrighton May 15, 2023
fba92eb
Update src/libraries/System.Private.CoreLib/gen/IntrinsicsInSystemPri…
davidwrighton May 15, 2023
fcc38ca
Update IntrinsicsInSystemPrivateCoreLibAnalyzer.cs
davidwrighton May 15, 2023
62c5e58
Use auto-implemented get-only property
davidwrighton May 15, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 47 additions & 118 deletions docs/design/coreclr/botr/vectors-and-intrinsics.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,123 +38,6 @@ For AOT compilation, the situation is far more complex. This is due to the follo
2. If AOT code is generated, it should be used unless there is an overriding reason to avoid using it.
3. It must be exceedingly difficult to misuse the AOT compilation tool to violate principle 1.

There are 2 different implementations of AOT compilation under development at this time. The crossgen1 model (which is currently supported on all platforms and architectures), and the crossgen2 model, which is under active development. Any developer wishing to use hardware intrinsics in the runtime or libraries should be aware of the restrictions imposed by the crossgen1 model. Crossgen2, which we expect will replace crossgen1 at some point in the future, has strictly fewer restrictions.

## Crossgen1 model of hardware intrinsic usage

###Code written in System.Private.CoreLib.dll
#### Crossgen implementation rules
- Any code which uses `Vector<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_SIMD_NGEN_DISALLOWED`)
- Code which uses Sse and Sse2 platform hardware intrinsics is always generated as it would be at jit time.
- Code which uses Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, and Lzcnt instruction sets will be generated, but the associated IsSupported check will be a runtime check. See `FilterNamedIntrinsicMethodAttribs` for details on how this is done.
- Code which uses other instruction sets will be generated as if the processor does not support that instruction set. (For instance, a usage of Avx2.IsSupported in CoreLib will generate native code where it unconditionally returns false, and then if and when tiered compilation occurs, the function may be rejitted and have code where the property returns true.)
- Non-platform intrinsics which require more hardware support than the minimum supported hardware capability will not take advantage of that capability. In particular the code generated for `Vector2/3/4.Dot`, and `Math.Round`, and `MathF.Round`. See `FilterNamedIntrinsicMethodAttribs` for details. MethodImplOptions.AggressiveOptimization may be used to disable precompilation compilation of this sub-par code.

#### Characteristics which result from rules
The rules here provide the following characteristics.
- Some platform specific hardware intrinsics can be used in CoreLib without encountering a startup time penalty
- Some uses of platform specific hardware intrinsics will force the compiler to be unable to AOT compile the code. However, if care is taken to only use intrinsics from the Sse, Sse2, Sse3, Ssse3, Sse41, Sse42, Popcnt, Pclmulqdq, or Lzcnt instruction sets, then the code may be AOT compiled. Preventing AOT compilation may cause a startup time penalty for important scenarios.
- Use of `Vector<T>` causes runtime jit and startup time concerns because it is never precompiled. Current analysis indicates this is acceptable, but it is a perennial concern for applications with tight startup time requirements.
- AOT generated code which could take advantage of more advanced hardware support experiences a performance penalty until rejitted. (If a customer chooses to disable tiered compilation, then customer code may always run slowly).

#### Code review rules for code written in System.Private.CoreLib.dll
- Any use of a platform intrinsic in the codebase MUST be wrapped with a call to the associated IsSupported property. This wrapping MUST be done within the same function that uses the hardware intrinsic, and MUST NOT be in a wrapper function unless it is one of the intrinsics that are enabled by default for crossgen compilation of System.Private.CoreLib (See list above in the implementation rules section).
- Within a single function that uses platform intrinsics, it must behave identically regardless of whether IsSupported returns true or not. This rule is required as code inside of an IsSupported check that calls a helper function cannot assume that the helper function will itself see its use of the same IsSupported check return true. This is due to the impact of tiered compilation on code execution within the process.
- Excessive use of intrinsics may cause startup performance problems due to additional jitting, or may not achieve desired performance characteristics due to suboptimal codegen.

ACCEPTABLE Code
```csharp
using System.Runtime.Intrinsics.X86;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this example helpful; could we keep it in the crossgen2 section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the example is no longer correct. With the new attribute, the "Unacceptable" code is now legal to use, and the analyzer will ensure that you put the attribute on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unacceptable code is still illegal without an attribute if I understand correctly - I found it helpful in understanding the problem that the attribute solves.


public class BitOperations
{
public static int PopCount(uint value)
{
if (Avx2.IsSupported)
{
Some series of Avx2 instructions that performs the popcount operation.
}
else
return FallbackPath(input);
}

private static int FallbackPath(uint)
{
const uint c1 = 0x_55555555u;
const uint c2 = 0x_33333333u;
const uint c3 = 0x_0F0F0F0Fu;
const uint c4 = 0x_01010101u;

value -= (value >> 1) & c1;
value = (value & c2) + ((value >> 2) & c2);
value = (((value + (value >> 4)) & c3) * c4) >> 24;

return (int)value;
}
}
```

UNACCEPTABLE code
```csharp
using System.Runtime.Intrinsics.X86;

public class BitOperations
{
public static int PopCount(uint value)
{
if (Avx2.IsSupported)
return UseAvx2(value);
else
return FallbackPath(input);
}

private static int FallbackPath(uint)
{
const uint c1 = 0x_55555555u;
const uint c2 = 0x_33333333u;
const uint c3 = 0x_0F0F0F0Fu;
const uint c4 = 0x_01010101u;

value -= (value >> 1) & c1;
value = (value & c2) + ((value >> 2) & c2);
value = (((value + (value >> 4)) & c3) * c4) >> 24;

return (int)value;
}

private static int UseAvx2(uint value)
{
// THIS IS A BUG!!!!!
Some series of Avx2 instructions that performs the popcount operation.
The bug here is triggered by the presence of tiered compilation and R2R. The R2R version
of this method may be compiled as if the Avx2 feature is not available, and is not reliably rejitted
at the same time as the PopCount function.

As a special note, on the x86 and x64 platforms, this generally unsafe pattern may be used
with the Sse, Sse2, Sse3, Sssse3, Ssse41 and Sse42 instruction sets as those instruction sets
are treated specially by both crossgen1 and crossgen2 when compiling System.Private.CoreLib.dll.
}
}
```

### Code written in other assemblies (both first and third party)

#### Crossgen implementation rules
- Any code which uses an intrinsic from the `System.Runtime.Intrinsics.Arm` or `System.Runtime.Intrinsics.X86` namespace will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`)
- Any code which uses `Vector<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_SIMD_NGEN_DISALLOWED`)
- Any code which uses `Vector64<T>`, `Vector128<T>`, `Vector256<T>`, or `Vector512<T>` will not be compiled AOT. (See code which throws a TypeLoadException using `IDS_EE_HWINTRINSIC_NGEN_DISALLOWED`)
- Non-platform intrinsics which require more hardware support than the minimum supported hardware capability will not take advantage of that capability. In particular the code generated for Vector2/3/4 is sub-optimal. MethodImplOptions.AggressiveOptimization may be used to disable compilation of this sub-par code.

#### Characteristics which result from rules
The rules here provide the following characteristics.
- Use of platform specific hardware intrinsics causes runtime jit and startup time concerns.
- Use of `Vector<T>` causes runtime jit and startup time concerns
- AOT generated code which could take advantage of more advanced hardware support experiences a performance penalty until rejitted. (If a customer chooses to disable tiered compilation, then customer code may always run slowly).

#### Code review rules for use of platform intrinsics
- Any use of a platform intrinsic in the codebase SHOULD be wrapped with a call to the associated IsSupported property. This wrapping may be done within the same function that uses the hardware intrinsic, but this is not required as long as the programmer can control all entrypoints to a function that uses the hardware intrinsic.
- If an application developer is highly concerned about startup performance, developers should avoid use of all platform specific hardware intrinsics on startup paths.

## Crossgen2 model of hardware intrinsic usage
There are 2 sets of instruction sets known to the compiler.
- The baseline instruction set which defaults to (Sse, Sse2), but may be adjusted via compiler option.
Expand All @@ -178,9 +61,55 @@ Code will be compiled using the optimistic instruction set to drive compilation,
- If an application developer is highly concerned about startup performance, developers should avoid use intrinsics beyond Sse42, or should use Crossgen with an updated baseline instruction set support.

### Crossgen2 adjustment to rules for System.Private.CoreLib.dll
Since System.Private.CoreLib.dll is known to be code reviewed with the code review rules as written above for crossgen1 with System.Private.CoreLib.dll, it is possible to relax rule "Code which attempts to use instruction sets outside of the optimistic set will generate code that will not be used on machines with support for the instruction set." What this will do is allow the generation of non-optimal code for these situations, but through the magic of code review, the generated logic will still work correctly.
Since System.Private.CoreLib.dll is known to be code reviewed with the code review rules as written above for crossgen1 with System.Private.CoreLib.dll, it is possible to relax rule "Code which attempts to use instruction sets outside of the optimistic set will generate code that will not be used on machines with support for the instruction set." What this will do is allow the generation of non-optimal code for these situations, but through the magic of code review and analyzers, the generated logic will still work correctly.
sbomer marked this conversation as resolved.
Show resolved Hide resolved

#### Code review rules for code written in System.Private.CoreLib.dll
- Any use of a platform intrinsic in the codebase MUST be wrapped with a call to an associated IsSupported property. This wrapping MUST be done within the same function that uses the hardware intrinsic, OR the function which uses the platform intrinsic must have the `BypassReadyToRunForIntrinsicsHelperUse` attribute used to indicate that this function will unconditionally call platform intrinsics of from some type.
- Within a single function that uses platform intrinsics, unless marked with the `BypassReadyToRunForIntrinsicsHelperUse` attribute it must behave identically regardless of whether IsSupported returns true or not. This allows the R2R compiler to compile with a lower set of intrinsics support, and yet expect that the behavior of the function will remain unchanged in the presence of tiered compilation.
- Excessive use of intrinsics may cause startup performance problems due to additional jitting, or may not achieve desired performance characteristics due to suboptimal codegen. To fix this, we may, in the future, change the compilation rules to compile the methods marked with`BypassReadyToRunForIntrinsicsHelperUse` with the appropriate platform intrinsics enabled.

Correct use of the `IsSupported` properties and `BypassReadyToRunForIntrinsicsHelperUse` attribute is checked by an analyzer during build of `System.Private.CoreLib`. This analyzer requires that all usage of `IsSupported` properties conform to a few specific patterns. These patterns are supported via either if statements or the ternary operator.

The supported conditional checks are

1. Simple if statement checking IsSupported flag surrounding usage
```
if (PlatformIntrinsicType.IsSupported)
{
PlatformIntrinsicType.IntrinsicMethod();
}
```

2. If statement check checking a platform intrinsic type which implies
that the intrinsic used is supported.

```
if (Avx2.X64.IsSupported)
{
Avx2.IntrinsicMethod();
}
```

3. Nested if statement where there is an outer condition which is an
OR'd together series of IsSupported checks for mutually exclusive
conditions and where the inner check is an else clause where some checks
are excluded from applying.

```
if (Avx2.IsSupported || ArmBase.IsSupported)
{
if (Avx2.IsSupported)
{
// Do something
}
else
{
ArmBase.IntrinsicMethod();
}
}
```

The behavior of the `BypassReadyToRunForIntrinsicsHelperUse` is that 1 or more attributes may be applied to a given method. If any of the types specified via the attribute will not have an invariant result for its associated `IsSupported` property at runtime, then the method will not be compiled or inlined into another function during R2R compilation. If no type so described will have a true result for the `IsSupported` method, then the method will not be compiled or inlined into another function during R2R compilation.
# Mechanisms in the JIT to generate correct code to handle varied instruction set support

The JIT receives flags which instruct it on what instruction sets are valid to use, and has access to a new jit interface api `notifyInstructionSetUsage(isa, bool supportBehaviorRequired)`.
Expand Down
Loading