-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding EVEX encoding logic for RR/AM pathways #77419
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis is a draft PR created for verifying some configs.
|
bf2025a
to
7468f73
Compare
@tannergooding @BruceForstall @anthonycanino These are the next set of changes for enabling EVEX encoding paths. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some minor questions and suggestions
@@ -29,117 +29,117 @@ | |||
#endif | |||
/*****************************************************************************/ | |||
#ifndef INST0 | |||
#define INST0(id, nm, um, mr, flags) | |||
#define INST0(id, nm, um, mr, tt, flags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment for tt
in the comment section at the top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/instr.h
Outdated
@@ -330,6 +339,31 @@ enum insBarrier : unsigned | |||
}; | |||
#endif | |||
|
|||
#if defined(TARGET_XARCH) | |||
// Represents tupletype attribute of instruction. | |||
// This is used in determining factor N while calculated compressed displacement in EVEX encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This is used in determining factor N while calculated compressed displacement in EVEX encoding | |
// This is used in determining factor N while calculating compressed displacement in EVEX encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// Represents tupletype attribute of instruction. | ||
// This is used in determining factor N while calculated compressed displacement in EVEX encoding | ||
// Reference: Section 2.6.5 in Intel 64 and ia-32 architectures software developer's manual volume 2. | ||
enum insTupleType : uint32_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why the 0xF00 bits are not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. Yeah that's a mistake. Fixed
@@ -1789,7 +1789,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type) | |||
} | |||
|
|||
// Conversions to or from floating point values | |||
instruction CodeGen::ins_FloatConv(var_types to, var_types from) | |||
instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document attr
? (Best would be to add a new-style function header comment that documents all the arguments and the function behavior)
Is this newly required for AVX-512? Does it change any existing behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It's required now because we divided INS_cvtsi2ss to INS_cvtsi2ss32 and INS_cvtsi2ss64. Similar for INS_cvtsi2sd. This is required because these instruction have some specific characteristic depending on if input is m32 or m64(W bit set or not)
src/coreclr/jit/instr.cpp
Outdated
@@ -1867,7 +1887,7 @@ instruction CodeGen::ins_MathOp(genTreeOps oper, var_types type) | |||
} | |||
} | |||
|
|||
instruction CodeGen::ins_FloatConv(var_types to, var_types from) | |||
instruction CodeGen::ins_FloatConv(var_types to, var_types from, emitAttr attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be best to ifdef this function declaration so you don't touch the ARM implementation which doesn't use this argument and for which it might be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -7015,7 +7015,7 @@ void CodeGen::genIntToFloatCast(GenTree* treeNode) | |||
|
|||
// Note that here we need to specify srcType that will determine | |||
// the size of source reg/mem operand and rex.w prefix. | |||
instruction ins = ins_FloatConv(dstType, TYP_INT); | |||
instruction ins = ins_FloatConv(dstType, TYP_INT, emitTypeSize(srcType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other cases use emitTypeSize(dstType)
but this uses srcType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. should be srcType . We are trying to differentiate between VCVTSI2SS xmm1, xmm2, m32
and VCVTSI2SS xmm1, xmm2, m64
src/coreclr/jit/emitxarch.cpp
Outdated
inline bool hasTupleTypeInfo(instruction ins) | ||
{ | ||
assert((unsigned)ins < ArrLen(insTupleTypeInfos)); | ||
return ((insTupleTypeInfos[ins] != INS_TT_NONE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
return ((insTupleTypeInfos[ins] != INS_TT_NONE)); | |
return (insTupleTypeInfos[ins] != INS_TT_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.cpp
Outdated
inline insTupleType insTupleTypeInfo(instruction ins) | ||
{ | ||
assert((unsigned)ins < ArrLen(insTupleTypeInfos)); | ||
assert((insTupleTypeInfos[ins] != INS_TT_NONE)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
assert((insTupleTypeInfos[ins] != INS_TT_NONE)); | |
assert(insTupleTypeInfos[ins] != INS_TT_NONE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.cpp
Outdated
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently | ||
// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doind so currently | |
// since we cannot differentiate EVEX vs VEX without 'code' untill all paths have EVEX support. | |
// Explore moving IsWEvexOpcodeExtension() logic inside TakesRexWPrefix(). Not doing so currently | |
// since we cannot differentiate EVEX vs VEX without 'code' until all paths have EVEX support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/coreclr/jit/emitxarch.h
Outdated
// id - Instruction descriptor. | ||
// | ||
// Returns: | ||
// `true` if the instruction does embeddded broadcast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// `true` if the instruction does embeddded broadcast. | |
// `true` if the instruction does embedded broadcast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
7876962
to
c24f908
Compare
c24f908
to
77aa41e
Compare
77aa41e
to
0a1d2b9
Compare
No diffs, as expected. Throughput is slightly improved on x64 (about 0.09%), slightly regressed on x86 (minimal on full opts, 0.18% on MinOpts) |
INS_TT_NONE = 0x00000, | ||
INS_TT_FULL = 0x00001, | ||
INS_TT_HALF = 0x00002, | ||
INS_TT_IS_BROADCAST = 0x00003, | ||
INS_TT_FULL_MEM = 0x00010, | ||
INS_TT_TUPLE1_SCALAR = 0x00020, | ||
INS_TT_TUPLE1_FIXED = 0x00040, | ||
INS_TT_TUPLE2 = 0x00080, | ||
INS_TT_TUPLE4 = 0x00100, | ||
INS_TT_TUPLE8 = 0x00200, | ||
INS_TT_HALF_MEM = 0x00400, | ||
INS_TT_QUARTER_MEM = 0x00800, | ||
INS_TT_EIGHTH_MEM = 0x01000, | ||
INS_TT_MEM128 = 0x02000, | ||
INS_TT_MOVDDUP = 0x04000, | ||
INS_TT_IS_NON_BROADCAST = 0x7FFFC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits to consider for later:
- The high zero isn't needed, especially since there aren't any cases which set it. Remove it?
- Thus, the "7" in INS_TT_IS_NON_BROADCAST is not needed
- I was confused at first by INS_TT_IS_BROADCAST being 3 and not 4. Maybe rename INS_TT_IS_BROADCAST to INS_TT_BROADCAST_MASK and INS_TT_IS_NON_BROADCAST to INS_TT_NON_BROADCAST_MASK to make it clear they are bit masks? (I notice they currently aren't used)
This PR enables EVEX encoding for the following paths