-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Add FK_3{A,B,C}, EJ_3A, EK_3A, EY_3B, EW_3{A,B} #98187
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsPart of #94549. Implements the following encodings:
cstool output:
JitDisasm output:
cc @dotnet/arm64-contrib
|
@a74nh sorry I stole one of your encodings; I thought I would get |
@@ -5912,6 +5912,42 @@ void CodeGen::genArm64EmitterUnitTestsSve() | |||
theEmitter->emitIns_R_R_R_I(INS_sve_udot, EA_SCALABLE, REG_V7, REG_V8, REG_V3, 3, | |||
INS_OPTS_SCALABLE_H); // UDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>] | |||
|
|||
// IF_SVE_EJ_3A | |||
theEmitter->emitIns_R_R_R_I(INS_sve_cdot, EA_SCALABLE, REG_V0, REG_V1, REG_V2, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these instructions that take a rotation value, shouldn't we be passing 0, 90, 180, 270 instead of 0, 1, 2, 3?
I also implemented a few instructions that needed rotation values here: #98141 which I am passing 0, 90, 180, 270.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks a little weird passing 0-3 instead of 0-270, but I did this to match the bit-level representation of the rotation value, so that we don't need a helper method to encode the rr
bits; then when displaying the instruction in JitDisasms, I multiply the immediate by 90 to display it correctly. I'm fine changing my approach to match yours, though I'll have to update a few encodings already merged in. @kunalspathak @a74nh do you have any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I look at is the emitIns
functions are APIs that should try to match what the instructions actually are. The bit-level representation/encoding is an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In that case, how about I wait for #98141 to be merged in, and then I'll update my encodings that use rotation values to use the helper methods you added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we write for readability first if we know that such an optimization would not make any difference in 99.9% of scenarios.
However, I'm fine with encoding it as 0, 1, 2, 3 on instrDesc
if that is what we all want. I'll have to adjust my work as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we write for readability first
It is readable from the perspective of calling the emitIns*
method as seen here: https://github.com/dotnet/runtime/pull/98187/files#diff-d4f9f119d0a321cea7e82023cb754d8abdb800d6185c8bb9464d389ebd50debcR6288
and we flip it just before saving it in instrdesc:
I am not sure if emitOutputInstr()
method is readable anyway :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sympathetic to the readability argument. I guess the silver lining with our current approach is the bitwise representation of the rotation value is abstracted away from the API surface (i.e. the emitIns
methods). Maybe I'm being naive, but I don't anticipate the code for handling the rotation values in emitIns
or emitDispInsHelp
changing with any frequency after this is merged in, whereas the usage of emitIns
will certainly increase once we start using these SVE instructions. So in the "important" case, readability isn't hindered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't necessarily about making emitOutputInstr
readable, but about the display code being simple. We will have to decode the imm
whose values are 0-3 to be translated to 0-270 on display.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what I'm trying to say is, if we encode the values as 0-3 on the instrDesc
, we will have to have an encode
/decode
for it, whereas if we store 0-270, we only need one encode
function.
src/coreclr/jit/emitarm64.cpp
Outdated
{ | ||
assert(isValidUimm4(imm)); // ii rr | ||
assert((REG_V0 <= reg3) && (reg3 <= REG_V7)); | ||
fmt = IF_SVE_FA_3A; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these changes? I see that we moved it under emitIns_R_R_R_I_I()
and wondering why this was not done when we implemented SVE_FA_3A
, SVE_FA_3B
, etc. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first added emitIns_R_R_R_I_I
, I was trying to minimize code duplication, so I just made it a wrapper for emitIns_R_R_R_I
by bitwise OR-ing imm1
and imm2
into one imm
, and then passing this along to emitIns_R_R_R_I
to do the rest. Now I'm running into instructions that have encodings that take one immediate, and encodings that take two immediates, so it's easier to separate these two emitIns
methods out.
src/coreclr/jit/emitarm64.cpp
Outdated
else | ||
{ | ||
assert(opt == INS_OPTS_SCALABLE_D); | ||
assert((REG_V0 <= reg3) && (reg3 <= REG_V15)); // mmmm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth adding a function like isLowVectorRegister()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
@kunalspathak thanks for the review -- I applied your feedback |
Diff results for #98187Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Diff results for #98187Throughput diffsThroughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Details here Throughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
Diff results for #98187Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64MinOpts (+0.00% to +0.01%)
Details here |
Part of #94549. Implements the following encodings:
cstool output:
JitDisasm output:
cc @dotnet/arm64-contrib