Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IR support for multireg lclVars & intrinsics #34822

Merged
merged 2 commits into from
Apr 17, 2020

Conversation

CarolEidt
Copy link
Contributor

This modifies the IR to accommodate multiple registers on GenTreeLclVar and the various intrinsics. It doesn't actually enable support for this, but I have experimental work that is showing promise.
I thought it best to get this reviewed first, as it may result in some discussion.

@CarolEidt CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 10, 2020
@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib PTAL - I'm trying to get an arm build and a device set up to track down those failures (they're not simple test failures, it looks like I've probably broken xunit).
However, in the meantime perhaps you could have a look at the direction of these changes.

@@ -3108,12 +3124,193 @@ struct GenTreeLclVarCommon : public GenTreeUnOp
#endif
};

//------------------------------------------------------------------------
// MultiRegSpillFlags
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These same flags were in use by multiple nodes, but with no code sharing. I'm now using them on GenTreeLclVar as well.

return 0;
}
return (gtOtherReg == REG_NA || gtOtherReg == REG_STK) ? 1 : 2;
return (TypeGet() == TYP_LONG) ? 2 : 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method used to return the number of actual registers, rather than the total count, but that is actually less useful. (It may be that this change is where I'm failing on arm32).


public:
var_types gtSIMDBaseType; // SIMD vector base type
unsigned char gtSIMDSize; // SIMD vector size in bytes, use 0 for scalar intrinsics
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By consolidating this information on GenTreeJitIntrinsic we can pack it better and it allows us to have a ClassLayout, which is not only useful for the multireg case, but could both streamline and improve our ability to associate the right class handle with SIMD and HW intrinsics.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, a few questions.
Are you planning to apply that design for multi-reg calls or only for intrinsics?

#define GTF_VAR_MULTIREG_DEATH1 0x00040000 // GT_LCL_VAR -- The last-use bit for the second register of a multireg lclVar.
#define GTF_VAR_MULTIREG_DEATH2 0x00080000 // GT_LCL_VAR -- The last-use bit for the third register of a multireg lclVar.
#define GTF_VAR_MULTIREG_DEATH3 0x00100000 // GT_LCL_VAR -- The last-use bit for the fourth register of a multireg lclVar.
#define GTF_VAR_MULTIREG_DEATH_MASK (GTF_VAR_MULTIREG_DEATH1 | GTF_VAR_MULTIREG_DEATH2 | GTF_VAR_MULTIREG_DEATH3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first register controlled with GTF_VAR_DEATH? Won't it be simpler to add GTF_VAR_MULTIREG_DEATH0 or change GTF_VAR_DEATH value to 0x00040000 to do #define GTF_VAR_MULTIREG_DEATH0 GTF_VAR_DEATH?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the first register is controlled with GTF_VAR_DEATH. It might be better to change the bit used for that - I'd have to check what bits are available.

{
return ((gtFlags & GTF_VAR_MULTIREG) != 0);
}
void ClearMultiReg()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which situations could we clear that flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag means that the node represents a multireg localVar, and it is only set when all the fields are register candidates. If that changes, the flag is cleared. This makes it a bit easier to determine the right code path to take without extra checking.

@@ -1449,8 +1449,7 @@ bool GenTree::Compare(GenTree* op1, GenTree* op2, bool swapOK)
case GT_HWINTRINSIC:
if ((op1->AsHWIntrinsic()->gtHWIntrinsicId != op2->AsHWIntrinsic()->gtHWIntrinsicId) ||
(op1->AsHWIntrinsic()->gtSIMDBaseType != op2->AsHWIntrinsic()->gtSIMDBaseType) ||
(op1->AsHWIntrinsic()->gtSIMDSize != op2->AsHWIntrinsic()->gtSIMDSize) ||
(op1->AsHWIntrinsic()->gtIndexBaseType != op2->AsHWIntrinsic()->gtIndexBaseType))
(op1->AsHWIntrinsic()->gtSIMDSize != op2->AsHWIntrinsic()->gtSIMDSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can we ignore GetOtherBaseType now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good point! We can't.

@CarolEidt
Copy link
Contributor Author

Are you planning to apply that design for multi-reg calls or only for intrinsics?

My plan is actually to implement multi-reg calls and returns first, and then intrinsics. But I had actually started experimenting with the latter, so I thought it made sense to do all the IR changes together.


var_types GetOtherBaseType() const
{
return gtOtherBaseType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that gtHWIntrinsicId == NI_AVX2_GatherVector128 || gtHWIntrinsicId == NI_AVX2_GatherVector256 or is it legal to call GetOtherBaseType not only for AVX2 Gather* intrinsic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add that assert, we'd have to add a check to the copy method, which I don't think is productive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I like that the new fields are private, but I find it confusing that we access gtOtherReg using this method, maybe we can make them separate fields? It won't change the node size because we allocate them as large nodes. Feel free to ignore that comment.

@@ -722,6 +722,14 @@ int GenTree::GetRegisterDstCount() const
#endif
}
#endif

#if defined(TARGET_XARCH) && defined(FEATURE_HW_INTRINSICS)
if (OperIs(GT_HWINTRINSIC))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an "else if", instead of an "if"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, IMO they should all be if instead of else if since the previous if clauses all return.

@@ -5550,6 +5558,47 @@ bool GenTree::OperMayThrow(Compiler* comp)
return false;
}

//-----------------------------------------------------------------------------------
// GetMultiRegCount: Return the register count for a multi-reg lclVar.
//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment header for the function named: GetFieldCount not GetMultiRegCount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, thanks. I renamed them but failed to fix the comment.


//-----------------------------------------------------------------------------------
// GetRegTypeByIndex: Get a specific register's type, based on regIndex, that is produced
// by this multi-reg node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a comment header for the function named: GetFieldTypeByIndex not GetRegTypeByIndex

#define GTF_VAR_DEATH GTF_VAR_MULTIREG_DEATH0
#define GTF_VAR_MULTIREG_DEATH1 0x00040000 // GT_LCL_VAR -- The last-use bit for the second register of a multireg lclVar.
#define GTF_VAR_MULTIREG_DEATH2 0x00080000 // GT_LCL_VAR -- The last-use bit for the third register of a multireg lclVar.
#define GTF_VAR_MULTIREG_DEATH3 0x00100000 // GT_LCL_VAR -- The last-use bit for the fourth register of a multireg lclVar.
Copy link
Contributor

@briansull briansull Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that you can use these bits to track the MULTIREG_DEATH info.

Only the upper 8-bits are node specific ,
the lower 24 bits are common for all nodes: (see GTF_COMMON_MASK)
If a particular node needs more that 8 bits of flags, then another approach has to be taken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there are unused bits sprinkled about. I will restructure these to ensure that I'm using bits that aren't used by other nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a flag bit on LCL_VAR too, see #34827. Hopefully enough to go around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now consolidated the GTF_VAR flags in bits 0xffc00000, and the GTF_COMMON_MASK is 0x0003FFFF after consolidating. So by my reckoning we have 4 bits in 0x003c0000 that are now available.

@tannergooding
Copy link
Member

From an API perspective, this will work best with (int Quotient, int Remainder) DivRem(int dividend, int divisor), so for APIs where we have already exposed methods like int DivRem(int dividend, int divisor, out int remainder), we will need to expose new APIs and can likely make the previous overload call the new overload, is that correct?

If so, do we expect this will be in for .NET 5 and therefore an API proposal covering existing cases should be opened/tracked? I can think of Math.DivRem, Math.BigMul, and Bmi2.MultiplyNoFlags off the top of my head, but I imagine we have a few more.

@CarolEidt
Copy link
Contributor Author

@tannergooding - I am first working on allowing assignment of multireg calls to enregistered locals. After that I'll work on the intrinsics, though I have an experimental version of Bmi2.MultiplyNoFlags working.

I'm hoping to have all of this supported for .NET 5, so I think it would be a good thing to develop a proposal. I was thinking that I could potentially synthesize a tuple in the importer, so that I could transform the version that takes an out parameter to return the tuple, but it's not easy to synthesize a struct type in the JIT. Adding that would probably be fairly involved. In my experimental implementation I added a Bmi2.MultiplyNoFlags2 and had the version with the out parameter simply call that one, and the code generated was basically the same as if you'd called the Bmi2.MultiplyNoFlags2 version directly (at least for my simple test case).

@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib PTAL - I think I've addressed the feedback.

@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib ping

@AndyAyersMS
Copy link
Member

I take it this is a no-diff change?

// Return Value:
// None
//
void CopyOtherRegFlags(GenTreeLclVar* from)
Copy link
Contributor

@sandreenko sandreenko Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it accept from as lclVar or as a call? The header is probably incorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, yes, cut and paste error.


var_types GetOtherBaseType() const
{
return gtOtherBaseType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I like that the new fields are private, but I find it confusing that we access gtOtherReg using this method, maybe we can make them separate fields? It won't change the node size because we allocate them as large nodes. Feel free to ignore that comment.

{
gtSIMDIntrinsicID = simdIntrinsicID;
assert(gtSIMDIntrinsicID == simdIntrinsicID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this assert check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks that the intrinsic ID fits into the the allocated space. At one time I'd made this smaller, before I restructured a bit. This could now be deleted.
Note, though, that GenTreeHWIntrinsic and GenTreeSIMD which derive from GenTreeJitIntrinsic are small nodes. I think we just allocate a large node for GT_INTRINSIC because it might become a call.

@CarolEidt
Copy link
Contributor Author

I take it this is a no-diff change?

Yes, that was the purpose of #34969 - to separate out the functional changes.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@CarolEidt CarolEidt merged commit c2f0bb5 into dotnet:master Apr 17, 2020
@CarolEidt CarolEidt deleted the AddMultiRegNodes branch April 28, 2020 16:45
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants