-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64/Sve: Add FFR register liveness tracking #105348
Changes from 5 commits
0f88d8e
10cf342
e7507bb
b23fac7
0c8b688
36984d0
c3d90dc
81446bd
06fe8d9
b685efd
ce63d38
b5851dd
c96b8b5
5cdaace
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -303,6 +303,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |
|
||
emitAttr emitSize; | ||
insOpts opt; | ||
bool unspilledFfr = false; | ||
|
||
if (HWIntrinsicInfo::SIMDScalar(intrin.id)) | ||
{ | ||
|
@@ -318,6 +319,39 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |
{ | ||
emitSize = EA_SCALABLE; | ||
opt = emitter::optGetSveInsOpt(emitTypeSize(intrin.baseType)); | ||
|
||
switch (intrin.id) | ||
{ | ||
case NI_Sve_GetFfrByte: | ||
case NI_Sve_GetFfrInt16: | ||
case NI_Sve_GetFfrInt32: | ||
case NI_Sve_GetFfrInt64: | ||
case NI_Sve_GetFfrSByte: | ||
case NI_Sve_GetFfrUInt16: | ||
case NI_Sve_GetFfrUInt32: | ||
case NI_Sve_GetFfrUInt64: | ||
{ | ||
if ((intrin.op1 != nullptr) && ((intrin.op1->gtFlags & GTF_SPILLED) != 0)) | ||
{ | ||
// If there was a op1 for this intrinsic, it means FFR is consumed here | ||
// and we need to unspill. | ||
unspilledFfr = true; | ||
} | ||
break; | ||
} | ||
case NI_Sve_LoadVectorFirstFaulting: | ||
{ | ||
if ((intrin.op3 != nullptr) && ((intrin.op3->gtFlags & GTF_SPILLED) != 0)) | ||
{ | ||
// If there was a op3 for this intrinsic, it means FFR is consumed here | ||
// and we need to unspill. | ||
unspilledFfr = true; | ||
} | ||
break; | ||
} | ||
default: | ||
break; | ||
} | ||
} | ||
else if (intrin.category == HW_Category_Special) | ||
{ | ||
|
@@ -2366,6 +2400,43 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node) | |
break; | ||
} | ||
|
||
case NI_Sve_LoadVectorFirstFaulting: | ||
{ | ||
if (unspilledFfr) | ||
{ | ||
// We have unspilled the FFR in op1Reg. Restore it back in FFR register. | ||
GetEmitter()->emitIns_R(INS_sve_wrffr, emitSize, op1Reg, opt); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be |
||
} | ||
|
||
insScalableOpts sopt = (opt == INS_OPTS_SCALABLE_B) ? INS_SCALABLE_OPTS_NONE : INS_SCALABLE_OPTS_LSL_N; | ||
GetEmitter()->emitIns_R_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, REG_ZR, opt, sopt); | ||
break; | ||
} | ||
|
||
case NI_Sve_GetFfrByte: | ||
case NI_Sve_GetFfrInt16: | ||
case NI_Sve_GetFfrInt32: | ||
case NI_Sve_GetFfrInt64: | ||
case NI_Sve_GetFfrSByte: | ||
case NI_Sve_GetFfrUInt16: | ||
case NI_Sve_GetFfrUInt32: | ||
case NI_Sve_GetFfrUInt64: | ||
{ | ||
if (unspilledFfr) | ||
{ | ||
// We have unspilled the FFR in op1Reg. Restore it back in FFR register. | ||
GetEmitter()->emitIns_R(INS_sve_wrffr, emitSize, op1Reg, opt); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could the codegen for these intrinsics just emit There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could, but I will have to double check if that is fine to do, because sometimes, we do want to make sure |
||
} | ||
|
||
GetEmitter()->emitIns_R(ins, emitSize, targetReg, INS_OPTS_SCALABLE_B); | ||
break; | ||
} | ||
case NI_Sve_SetFfr: | ||
{ | ||
assert(targetReg == REG_NA); | ||
GetEmitter()->emitIns_R(ins, emitSize, op1Reg, opt); | ||
break; | ||
} | ||
case NI_Sve_ConditionalExtractAfterLastActiveElementScalar: | ||
case NI_Sve_ConditionalExtractLastActiveElementScalar: | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't call
genConsumeReg
on this operand? IsGTF_SPILLED
the only thing that needs to be handled then?I don't really understand what the FFR register is, but what happens if it gets changed by a callee and we didn't end up spilling the local (that represents p14) around that call. Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we do call it, and this is detecting if it unspilled into
p14
? I wonder if unspilling itself could have the code to mirror p14 to ffr. I'm sort of worried about what can happen with GT_RELOAD or GT_COPY here... You may want to run jitstressregs (not sure if we'll have any coverage, though?)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the entire purpose of this PR, to make sure that we spill it around the call. Since all predicate registers are callee trash, using
P14
does force to save across the call (if it is live).yes, unspilling will happen as part of
genConsumeMultiOpOperands()
.I could, but that will exercise in the common code path. Having it here already restricts us to do this for scalable APIs which are SVE only.
Unfortunately we have basic coverage of the CI pipeline that I added few days back that tests with
AltJit
, but not with jitstressregs. I will see if I can test it locally.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't get a good repro, but I can potentially add a
gtSkipReloadOrCopy
here to bypass the COPY/RELOAD to check if i will be unspilling here or not. TheconsumeRegs
will anyway do the right thing.