Skip to content

Commit

Permalink
Hardware instruction set support for crossgen2 (#33274)
Browse files Browse the repository at this point in the history
- Add support for the --instruction-set parameter as described in #226 . 
NOTE: As the abi for Vector parameters is not yet stable, support for the --instruction-set parameter is only enabled if --inputbubble is also enabled. Parallel work to stabilize the abi is in progress, but is not complete.
ALSO NOTE: The names of the instruction sets are shared with mono, and don't follow the names in issue #226
- Add concept of baseline instruction set support to R2R file format 
- Can be applied at a per method level or at the entire R2R file level
  - R2RDump support for dumping the extra data
- Refactor how support for hardware intrinsics beyond SSE2 support are handled in crossgen2 
- Add feature to the JIT to detect which hardware features are actually used
  - Tell the JIT unconditionally that SSE42+Lzcnt+Popcnt+Pclmulqdq are supported
  - But if support beyond the --instruction-set specified baseline is used, notate the method with a per-method instruction set support fixup.
  - This enables usage of many intrinsics in corelib with greater efficiency than today
  - This enables usage of SSE42 and below intrinsics safely in non-CoreLib code. Use of higher level intrinsics in non CoreLib code will generate code which does not use the higher level intrinsic, and note that the method's code should not be used in the presence of hardware which does support greater CPU capabilities. 
  - In the future a logical enhancement of this work would be to generate multiple bodies of code to handle these more complex cases.
  - In combination with the --instruction-set argument, if Avx2 is enabled, then the logic gracefully adds a dependency on Avx2 capability and Vector<T> becomes useable by crossgen'd code.
  • Loading branch information
davidwrighton authored Apr 3, 2020
1 parent c21c7fd commit 5ac25ac
Show file tree
Hide file tree
Showing 54 changed files with 1,775 additions and 503 deletions.
1 change: 1 addition & 0 deletions docs/design/coreclr/botr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Below is a table of contents.
- [Cross-platform Minidumps](xplat-minidump-generation.md)
- [Mixed Mode Assemblies](mixed-mode.md)
- [Guide For Porting](guide-for-porting.md)
- [Vectors and Intrinsics](vectors-and-intrinsics.md)


It may be possible that this table is not complete. You can get a complete list
Expand Down
1 change: 1 addition & 0 deletions docs/design/coreclr/botr/readytorun-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ fixup kind, the rest of the signature varies based on the fixup kind.
| READYTORUN_FIXUP_DeclaringTypeHandle | 0x2D | Dictionary lookup for method declaring type. Followed by the type signature.
| READYTORUN_FIXUP_IndirectPInvokeTarget | 0x2E | Target (indirect) of an inlined PInvoke. Followed by method signature.
| READYTORUN_FIXUP_PInvokeTarget | 0x2F | Target of an inlined PInvoke. Followed by method signature.
| READYTORUN_FIXUP_Check_InstructionSetSupport | 0x30 | Specify the instruction sets that must be supported/unsupported to use the R2R code associated with the fixup.
| READYTORUN_FIXUP_ModuleOverride | 0x80 | When or-ed to the fixup ID, the fixup byte in the signature is followed by an encoded uint with assemblyref index, either within the MSIL metadata of the master context module for the signature or within the manifest metadata R2R header table (used in cases inlining brings in references to assemblies not seen in the input MSIL).

#### Method Signatures
Expand Down
201 changes: 201 additions & 0 deletions docs/design/coreclr/botr/vectors-and-intrinsics.md

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions src/coreclr/src/inc/corcompile.h
Original file line number Diff line number Diff line change
Expand Up @@ -691,6 +691,8 @@ enum CORCOMPILE_FIXUP_BLOB_KIND
ENCODE_INDIRECT_PINVOKE_TARGET, /* For calling a pinvoke method ptr indirectly */
ENCODE_PINVOKE_TARGET, /* For calling a pinvoke method ptr */

ENCODE_CHECK_INSTRUCTION_SET_SUPPORT, /* Define the set of instruction sets that must be supported/unsupported to use the fixup */

ENCODE_MODULE_HANDLE = 0x50, /* Module token */
ENCODE_STATIC_FIELD_ADDRESS, /* For accessing a static field */
ENCODE_MODULE_ID_FOR_STATICS, /* For accessing static fields */
Expand Down
61 changes: 61 additions & 0 deletions src/coreclr/src/inc/corinfoinstructionset.h
Original file line number Diff line number Diff line change
Expand Up @@ -428,4 +428,65 @@ inline const char *InstructionSetToString(CORINFO_InstructionSet instructionSet)
#endif
}

inline CORINFO_InstructionSet InstructionSetFromR2RInstructionSet(ReadyToRunInstructionSet r2rSet)
{
#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable: 4065) // disable warning for switch statement with only default label.
#endif

switch (r2rSet)
{
#ifdef TARGET_ARM64
case READYTORUN_INSTRUCTION_ArmBase: return InstructionSet_ArmBase;
case READYTORUN_INSTRUCTION_AdvSimd: return InstructionSet_AdvSimd;
case READYTORUN_INSTRUCTION_Aes: return InstructionSet_Aes;
case READYTORUN_INSTRUCTION_Crc32: return InstructionSet_Crc32;
case READYTORUN_INSTRUCTION_Sha1: return InstructionSet_Sha1;
case READYTORUN_INSTRUCTION_Sha256: return InstructionSet_Sha256;
case READYTORUN_INSTRUCTION_Atomics: return InstructionSet_Atomics;
#endif // TARGET_ARM64
#ifdef TARGET_AMD64
case READYTORUN_INSTRUCTION_Sse: return InstructionSet_SSE;
case READYTORUN_INSTRUCTION_Sse2: return InstructionSet_SSE2;
case READYTORUN_INSTRUCTION_Sse3: return InstructionSet_SSE3;
case READYTORUN_INSTRUCTION_Ssse3: return InstructionSet_SSSE3;
case READYTORUN_INSTRUCTION_Sse41: return InstructionSet_SSE41;
case READYTORUN_INSTRUCTION_Sse42: return InstructionSet_SSE42;
case READYTORUN_INSTRUCTION_Avx: return InstructionSet_AVX;
case READYTORUN_INSTRUCTION_Avx2: return InstructionSet_AVX2;
case READYTORUN_INSTRUCTION_Aes: return InstructionSet_AES;
case READYTORUN_INSTRUCTION_Bmi1: return InstructionSet_BMI1;
case READYTORUN_INSTRUCTION_Bmi2: return InstructionSet_BMI2;
case READYTORUN_INSTRUCTION_Fma: return InstructionSet_FMA;
case READYTORUN_INSTRUCTION_Lzcnt: return InstructionSet_LZCNT;
case READYTORUN_INSTRUCTION_Pclmulqdq: return InstructionSet_PCLMULQDQ;
case READYTORUN_INSTRUCTION_Popcnt: return InstructionSet_POPCNT;
#endif // TARGET_AMD64
#ifdef TARGET_X86
case READYTORUN_INSTRUCTION_Sse: return InstructionSet_SSE;
case READYTORUN_INSTRUCTION_Sse2: return InstructionSet_SSE2;
case READYTORUN_INSTRUCTION_Sse3: return InstructionSet_SSE3;
case READYTORUN_INSTRUCTION_Ssse3: return InstructionSet_SSSE3;
case READYTORUN_INSTRUCTION_Sse41: return InstructionSet_SSE41;
case READYTORUN_INSTRUCTION_Sse42: return InstructionSet_SSE42;
case READYTORUN_INSTRUCTION_Avx: return InstructionSet_AVX;
case READYTORUN_INSTRUCTION_Avx2: return InstructionSet_AVX2;
case READYTORUN_INSTRUCTION_Aes: return InstructionSet_AES;
case READYTORUN_INSTRUCTION_Bmi1: return InstructionSet_BMI1;
case READYTORUN_INSTRUCTION_Bmi2: return InstructionSet_BMI2;
case READYTORUN_INSTRUCTION_Fma: return InstructionSet_FMA;
case READYTORUN_INSTRUCTION_Lzcnt: return InstructionSet_LZCNT;
case READYTORUN_INSTRUCTION_Pclmulqdq: return InstructionSet_PCLMULQDQ;
case READYTORUN_INSTRUCTION_Popcnt: return InstructionSet_POPCNT;
#endif // TARGET_X86

default:
return InstructionSet_ILLEGAL;
}
#ifdef _MSC_VER
#pragma warning(pop)
#endif
}

#endif // CORINFOINSTRUCTIONSET_H
5 changes: 5 additions & 0 deletions src/coreclr/src/inc/corjitflags.h
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,11 @@ class CORJIT_FLAGS
instructionSetFlags.AddInstructionSet(instructionSet);
}

bool IsSet(CORINFO_InstructionSet instructionSet) const
{
return instructionSetFlags.HasInstructionSet(instructionSet);
}

void Clear(CORINFO_InstructionSet instructionSet)
{
instructionSetFlags.RemoveInstructionSet(instructionSet);
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/src/inc/readytorun.h
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,8 @@ enum ReadyToRunFixupKind

READYTORUN_FIXUP_IndirectPInvokeTarget = 0x2E, /* Target (indirect) of an inlined pinvoke */
READYTORUN_FIXUP_PInvokeTarget = 0x2F, /* Target of an inlined pinvoke */

READYTORUN_FIXUP_Check_InstructionSetSupport= 0x30, /* Define the set of instruction sets that must be supported/unsupported to use the fixup */
};

//
Expand Down Expand Up @@ -359,6 +361,8 @@ enum ReadyToRunHelper
READYTORUN_HELPER_StackProbe = 0x111,
};

#include "readytoruninstructionset.h"

//
// Exception info
//
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/src/jit/codegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2723,7 +2723,7 @@ void CodeGen::genLockedInstructions(GenTreeOp* treeNode)

emitAttr dataSize = emitActualTypeSize(data);

if (compiler->compSupports(InstructionSet_Atomics))
if (compiler->compOpportunisticallyDependsOn(InstructionSet_Atomics))
{
assert(!data->isContainedIntOrIImmed());

Expand Down Expand Up @@ -2860,7 +2860,7 @@ void CodeGen::genCodeForCmpXchg(GenTreeCmpXchg* treeNode)
genConsumeRegs(data);
genConsumeRegs(comparand);

if (compiler->compSupports(InstructionSet_Atomics))
if (compiler->compOpportunisticallyDependsOn(InstructionSet_Atomics))
{
emitAttr dataSize = emitActualTypeSize(data);

Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/src/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7256,7 +7256,7 @@ void CodeGen::genSSE2BitwiseOp(GenTree* treeNode)
void CodeGen::genSSE41RoundOp(GenTreeOp* treeNode)
{
// i) SSE4.1 is supported by the underlying hardware
assert(compiler->compSupports(InstructionSet_SSE41));
assert(compiler->compIsaSupportedDebugOnly(InstructionSet_SSE41));

// ii) treeNode oper is a GT_INTRINSIC
assert(treeNode->OperGet() == GT_INTRINSIC);
Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/src/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2194,10 +2194,11 @@ void Compiler::compSetProcessor()
#endif // TARGET_X86

CORINFO_InstructionSetFlags instructionSetFlags = jitFlags.GetInstructionSetFlags();
opts.compSupportsISA = 0;
opts.compSupportsISAReported = 0;

#ifdef TARGET_XARCH
// Instruction set flags for Intel hardware intrinsics
opts.compSupportsISA = 0;
bool avxSupported = false;

if (JitConfig.EnableHWIntrinsic())
{
Expand Down
111 changes: 93 additions & 18 deletions src/coreclr/src/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -7648,12 +7648,12 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
SIMDLevel getSIMDSupportLevel()
{
#if defined(TARGET_XARCH)
if (compSupports(InstructionSet_AVX2))
if (compOpportunisticallyDependsOn(InstructionSet_AVX2))
{
return SIMD_AVX2_Supported;
}

if (compSupports(InstructionSet_SSE42))
if (compOpportunisticallyDependsOn(InstructionSet_SSE42))
{
return SIMD_SSE4_Supported;
}
Expand Down Expand Up @@ -7799,7 +7799,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
unreached();
}
}
assert(emitTypeSize(simdType) <= maxSIMDStructBytes());
assert(emitTypeSize(simdType) <= largestEnregisterableStructSize());
switch (simdBaseType)
{
case TYP_FLOAT:
Expand Down Expand Up @@ -8045,13 +8045,6 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
// Whether SIMD vector occupies part of SIMD register.
// SSE2: vector2f/3f are considered sub register SIMD types.
// AVX: vector2f, 3f and 4f are all considered sub register SIMD types.
bool isSubRegisterSIMDType(CORINFO_CLASS_HANDLE typeHnd)
{
unsigned sizeBytes = 0;
var_types baseType = getBaseTypeAndSizeOfSIMDType(typeHnd, &sizeBytes);
return (baseType == TYP_FLOAT) && (sizeBytes < getSIMDVectorRegisterByteLength());
}

bool isSubRegisterSIMDType(GenTreeSIMD* simdNode)
{
return (simdNode->gtSIMDSize < getSIMDVectorRegisterByteLength());
Expand All @@ -8068,6 +8061,8 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
}
else
{
// Verify and record that AVX2 isn't supported
compVerifyInstructionSetUnuseable(InstructionSet_AVX2);
assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);
return TYP_SIMD16;
}
Expand Down Expand Up @@ -8108,6 +8103,9 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
else
{
assert(getSIMDSupportLevel() >= SIMD_SSE2_Supported);

// Verify and record that AVX2 isn't supported
compVerifyInstructionSetUnuseable(InstructionSet_AVX2);
return XMM_REGSIZE_BYTES;
}
#elif defined(TARGET_ARM64)
Expand All @@ -8128,7 +8126,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
unsigned int maxSIMDStructBytes()
{
#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_XARCH)
if (compSupports(InstructionSet_AVX))
if (compOpportunisticallyDependsOn(InstructionSet_AVX))
{
return YMM_REGSIZE_BYTES;
}
Expand All @@ -8141,6 +8139,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
return getSIMDVectorRegisterByteLength();
#endif
}

unsigned int minSIMDStructBytes()
{
return emitTypeSize(TYP_SIMD8);
Expand Down Expand Up @@ -8200,14 +8199,40 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
unsigned largestEnregisterableStructSize()
{
#ifdef FEATURE_SIMD
unsigned vectorRegSize = getSIMDVectorRegisterByteLength();
#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_XARCH)
if (opts.IsReadyToRun())
{
// Return constant instead of maxSIMDStructBytes, as maxSIMDStructBytes performs
// checks that are effected by the current level of instruction set support would
// otherwise cause the highest level of instruction set support to be reported to crossgen2.
// and this api is only ever used as an optimization or assert, so no reporting should
// ever happen.
return YMM_REGSIZE_BYTES;
}
#endif // defined(FEATURE_HW_INTRINSICS) && defined(TARGET_XARCH)
unsigned vectorRegSize = maxSIMDStructBytes();
assert(vectorRegSize >= TARGET_POINTER_SIZE);
return vectorRegSize;
#else // !FEATURE_SIMD
return TARGET_POINTER_SIZE;
#endif // !FEATURE_SIMD
}

// Use to determine if a struct *might* be a SIMD type. As this function only takes a size, many
// structs will fit the criteria.
bool structSizeMightRepresentSIMDType(size_t structSize)
{
#ifdef FEATURE_SIMD
// Do not use maxSIMDStructBytes as that api in R2R on X86 and X64 may notify the JIT
// about the size of a struct under the assumption that the struct size needs to be recorded.
// By using largestEnregisterableStructSize here, the detail of whether or not Vector256<T> is
// enregistered or not will not be messaged to the R2R compiler.
return (structSize >= minSIMDStructBytes()) && (structSize <= largestEnregisterableStructSize());
#else
return false;
#endif // FEATURE_SIMD
}

#ifdef FEATURE_SIMD
static bool vnEncodesResultTypeForSIMDIntrinsic(SIMDIntrinsicID intrinsicId);
#endif // !FEATURE_SIMD
Expand Down Expand Up @@ -8285,21 +8310,74 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
return false;
}

bool compSupports(CORINFO_InstructionSet isa) const
#ifdef DEBUG
// Answer the question: Is a particular ISA supported?
// Use this api when asking the question so that future
// ISA questions can be asked correctly or when asserting
// support/nonsupport for an instruction set
bool compIsaSupportedDebugOnly(CORINFO_InstructionSet isa) const
{
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
return (opts.compSupportsISA & (1ULL << isa)) != 0;
#else
return false;
#endif
}
#endif // DEBUG

void notifyInstructionSetUsage(CORINFO_InstructionSet isa, bool supported) const;

// Answer the question: Is a particular ISA supported?
// The result of this api call will exactly match the target machine
// on which the function is executed (except for CoreLib, where there are special rules)
bool compExactlyDependsOn(CORINFO_InstructionSet isa) const
{

#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
uint64_t isaBit = (1ULL << isa);
bool isaSupported = (opts.compSupportsISA & (1ULL << isa)) != 0;
if ((opts.compSupportsISAReported & isaBit) == 0)
{
notifyInstructionSetUsage(isa, isaSupported);
((Compiler*)this)->opts.compSupportsISAReported |= isaBit;
}

return isaSupported;
#else
return false;
#endif
}

// Ensure that code will not execute if an instruction set is useable. Call only
// if the instruction set has previously reported as unuseable, but when
// that that status has not yet been recorded to the AOT compiler
void compVerifyInstructionSetUnuseable(CORINFO_InstructionSet isa)
{
// use compExactlyDependsOn to capture are record the use of the isa
bool isaUseable = compExactlyDependsOn(isa);
// Assert that the is unuseable. If true, this function should never be called.
assert(!isaUseable);
}

// Answer the question: Is a particular ISA supported?
// The result of this api call will match the target machine if the result is true
// If the result is false, then the target machine may have support for the instruction
bool compOpportunisticallyDependsOn(CORINFO_InstructionSet isa) const
{
if ((opts.compSupportsISA & (1ULL << isa)) != 0)
{
return compExactlyDependsOn(isa);
}
else
{
return false;
}
}

bool canUseVexEncoding() const
{
#ifdef TARGET_XARCH
return compSupports(InstructionSet_AVX);
return compOpportunisticallyDependsOn(InstructionSet_AVX);
#else
return false;
#endif
Expand Down Expand Up @@ -8394,14 +8472,11 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
{
JitFlags* jitFlags; // all flags passed from the EE

#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
uint64_t compSupportsISA;
#endif
uint64_t compSupportsISAReported;
void setSupportedISAs(CORINFO_InstructionSetFlags isas)
{
#if defined(TARGET_XARCH) || defined(TARGET_ARM64)
compSupportsISA = isas.GetFlagsRaw();
#endif
}

unsigned compFlags; // method attributes
Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/src/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17066,7 +17066,7 @@ GenTree* Compiler::gtGetSIMDZero(var_types simdType, var_types baseType, CORINFO
switch (simdType)
{
case TYP_SIMD16:
if (compSupports(InstructionSet_SSE))
if (compExactlyDependsOn(InstructionSet_SSE))
{
// We only return the HWIntrinsicNode if SSE is supported, since it is possible for
// the user to disable the SSE HWIntrinsic support via the COMPlus configuration knobs
Expand All @@ -17075,7 +17075,7 @@ GenTree* Compiler::gtGetSIMDZero(var_types simdType, var_types baseType, CORINFO
}
return nullptr;
case TYP_SIMD32:
if (compSupports(InstructionSet_AVX))
if (compExactlyDependsOn(InstructionSet_AVX))
{
// We only return the HWIntrinsicNode if AVX is supported, since it is possible for
// the user to disable the AVX HWIntrinsic support via the COMPlus configuration knobs
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/src/jit/hwintrinsic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ NamedIntrinsic HWIntrinsicInfo::lookupId(Compiler* comp,
return NI_Illegal;
}

bool isIsaSupported = comp->compSupports(isa) && comp->compSupportsHWIntrinsic(isa);
bool isIsaSupported = comp->compExactlyDependsOn(isa) && comp->compSupportsHWIntrinsic(isa);

if (strcmp(methodName, "get_IsSupported") == 0)
{
Expand Down
Loading

0 comments on commit 5ac25ac

Please sign in to comment.