-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IR support for multireg lclVars & intrinsics #34822
Conversation
@dotnet/jit-contrib PTAL - I'm trying to get an arm build and a device set up to track down those failures (they're not simple test failures, it looks like I've probably broken xunit). |
@@ -3108,12 +3124,193 @@ struct GenTreeLclVarCommon : public GenTreeUnOp | |||
#endif | |||
}; | |||
|
|||
//------------------------------------------------------------------------ | |||
// MultiRegSpillFlags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These same flags were in use by multiple nodes, but with no code sharing. I'm now using them on GenTreeLclVar
as well.
return 0; | ||
} | ||
return (gtOtherReg == REG_NA || gtOtherReg == REG_STK) ? 1 : 2; | ||
return (TypeGet() == TYP_LONG) ? 2 : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method used to return the number of actual registers, rather than the total count, but that is actually less useful. (It may be that this change is where I'm failing on arm32).
|
||
public: | ||
var_types gtSIMDBaseType; // SIMD vector base type | ||
unsigned char gtSIMDSize; // SIMD vector size in bytes, use 0 for scalar intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By consolidating this information on GenTreeJitIntrinsic
we can pack it better and it allows us to have a ClassLayout
, which is not only useful for the multireg case, but could both streamline and improve our ability to associate the right class handle with SIMD and HW intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, a few questions.
Are you planning to apply that design for multi-reg calls or only for intrinsics?
src/coreclr/src/jit/gentree.h
Outdated
#define GTF_VAR_MULTIREG_DEATH1 0x00040000 // GT_LCL_VAR -- The last-use bit for the second register of a multireg lclVar. | ||
#define GTF_VAR_MULTIREG_DEATH2 0x00080000 // GT_LCL_VAR -- The last-use bit for the third register of a multireg lclVar. | ||
#define GTF_VAR_MULTIREG_DEATH3 0x00100000 // GT_LCL_VAR -- The last-use bit for the fourth register of a multireg lclVar. | ||
#define GTF_VAR_MULTIREG_DEATH_MASK (GTF_VAR_MULTIREG_DEATH1 | GTF_VAR_MULTIREG_DEATH2 | GTF_VAR_MULTIREG_DEATH3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the first register controlled with GTF_VAR_DEATH
? Won't it be simpler to add GTF_VAR_MULTIREG_DEATH0
or change GTF_VAR_DEATH
value to 0x00040000
to do #define GTF_VAR_MULTIREG_DEATH0 GTF_VAR_DEATH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the first register is controlled with GTF_VAR_DEATH
. It might be better to change the bit used for that - I'd have to check what bits are available.
{ | ||
return ((gtFlags & GTF_VAR_MULTIREG) != 0); | ||
} | ||
void ClearMultiReg() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which situations could we clear that flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag means that the node represents a multireg localVar, and it is only set when all the fields are register candidates. If that changes, the flag is cleared. This makes it a bit easier to determine the right code path to take without extra checking.
src/coreclr/src/jit/gentree.cpp
Outdated
@@ -1449,8 +1449,7 @@ bool GenTree::Compare(GenTree* op1, GenTree* op2, bool swapOK) | |||
case GT_HWINTRINSIC: | |||
if ((op1->AsHWIntrinsic()->gtHWIntrinsicId != op2->AsHWIntrinsic()->gtHWIntrinsicId) || | |||
(op1->AsHWIntrinsic()->gtSIMDBaseType != op2->AsHWIntrinsic()->gtSIMDBaseType) || | |||
(op1->AsHWIntrinsic()->gtSIMDSize != op2->AsHWIntrinsic()->gtSIMDSize) || | |||
(op1->AsHWIntrinsic()->gtIndexBaseType != op2->AsHWIntrinsic()->gtIndexBaseType)) | |||
(op1->AsHWIntrinsic()->gtSIMDSize != op2->AsHWIntrinsic()->gtSIMDSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why can we ignore GetOtherBaseType
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point! We can't.
My plan is actually to implement multi-reg calls and returns first, and then intrinsics. But I had actually started experimenting with the latter, so I thought it made sense to do all the IR changes together. |
|
||
var_types GetOtherBaseType() const | ||
{ | ||
return gtOtherBaseType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check that gtHWIntrinsicId == NI_AVX2_GatherVector128 || gtHWIntrinsicId == NI_AVX2_GatherVector256
or is it legal to call GetOtherBaseType
not only for AVX2 Gather* intrinsic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add that assert, we'd have to add a check to the copy method, which I don't think is productive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I like that the new fields are private, but I find it confusing that we access gtOtherReg
using this method, maybe we can make them separate fields? It won't change the node size because we allocate them as large nodes. Feel free to ignore that comment.
7925cb2
to
9116cdb
Compare
@@ -722,6 +722,14 @@ int GenTree::GetRegisterDstCount() const | |||
#endif | |||
} | |||
#endif | |||
|
|||
#if defined(TARGET_XARCH) && defined(FEATURE_HW_INTRINSICS) | |||
if (OperIs(GT_HWINTRINSIC)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be an "else if", instead of an "if"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, IMO they should all be if
instead of else if
since the previous if
clauses all return.
@@ -5550,6 +5558,47 @@ bool GenTree::OperMayThrow(Compiler* comp) | |||
return false; | |||
} | |||
|
|||
//----------------------------------------------------------------------------------- | |||
// GetMultiRegCount: Return the register count for a multi-reg lclVar. | |||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment header for the function named: GetFieldCount
not GetMultiRegCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, thanks. I renamed them but failed to fix the comment.
|
||
//----------------------------------------------------------------------------------- | ||
// GetRegTypeByIndex: Get a specific register's type, based on regIndex, that is produced | ||
// by this multi-reg node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a comment header for the function named: GetFieldTypeByIndex
not GetRegTypeByIndex
src/coreclr/src/jit/gentree.h
Outdated
#define GTF_VAR_DEATH GTF_VAR_MULTIREG_DEATH0 | ||
#define GTF_VAR_MULTIREG_DEATH1 0x00040000 // GT_LCL_VAR -- The last-use bit for the second register of a multireg lclVar. | ||
#define GTF_VAR_MULTIREG_DEATH2 0x00080000 // GT_LCL_VAR -- The last-use bit for the third register of a multireg lclVar. | ||
#define GTF_VAR_MULTIREG_DEATH3 0x00100000 // GT_LCL_VAR -- The last-use bit for the fourth register of a multireg lclVar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that you can use these bits to track the MULTIREG_DEATH info.
Only the upper 8-bits are node specific ,
the lower 24 bits are common for all nodes: (see GTF_COMMON_MASK)
If a particular node needs more that 8 bits of flags, then another approach has to be taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there are unused bits sprinkled about. I will restructure these to ensure that I'm using bits that aren't used by other nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a flag bit on LCL_VAR too, see #34827. Hopefully enough to go around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now consolidated the GTF_VAR
flags in bits 0xffc00000
, and the GTF_COMMON_MASK
is 0x0003FFFF
after consolidating. So by my reckoning we have 4 bits in 0x003c0000
that are now available.
ec72066
to
3025aaa
Compare
3025aaa
to
0de1a70
Compare
From an API perspective, this will work best with If so, do we expect this will be in for .NET 5 and therefore an API proposal covering existing cases should be opened/tracked? I can think of |
@tannergooding - I am first working on allowing assignment of multireg calls to enregistered locals. After that I'll work on the intrinsics, though I have an experimental version of I'm hoping to have all of this supported for .NET 5, so I think it would be a good thing to develop a proposal. I was thinking that I could potentially synthesize a tuple in the importer, so that I could transform the version that takes an out parameter to return the tuple, but it's not easy to synthesize a struct type in the JIT. Adding that would probably be fairly involved. In my experimental implementation I added a |
@dotnet/jit-contrib PTAL - I think I've addressed the feedback. |
@dotnet/jit-contrib ping |
I take it this is a no-diff change? |
// Return Value: | ||
// None | ||
// | ||
void CopyOtherRegFlags(GenTreeLclVar* from) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it accept from
as lclVar or as a call? The header is probably incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, yes, cut and paste error.
|
||
var_types GetOtherBaseType() const | ||
{ | ||
return gtOtherBaseType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I like that the new fields are private, but I find it confusing that we access gtOtherReg
using this method, maybe we can make them separate fields? It won't change the node size because we allocate them as large nodes. Feel free to ignore that comment.
src/coreclr/src/jit/gentree.h
Outdated
{ | ||
gtSIMDIntrinsicID = simdIntrinsicID; | ||
assert(gtSIMDIntrinsicID == simdIntrinsicID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this assert check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It checks that the intrinsic ID fits into the the allocated space. At one time I'd made this smaller, before I restructured a bit. This could now be deleted.
Note, though, that GenTreeHWIntrinsic
and GenTreeSIMD
which derive from GenTreeJitIntrinsic
are small nodes. I think we just allocate a large node for GT_INTRINSIC because it might become a call.
Yes, that was the purpose of #34969 - to separate out the functional changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This modifies the IR to accommodate multiple registers on
GenTreeLclVar
and the various intrinsics. It doesn't actually enable support for this, but I have experimental work that is showing promise.I thought it best to get this reviewed first, as it may result in some discussion.