Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Add checks for MOVPRFX sequencing (#105514) #106184

Merged
merged 21 commits into from
Aug 14, 2024

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Aug 9, 2024

As per the Arm Arm, MOVPRFX instructions must be followed by an SVE instruction of specific type, otherwise behaviour is undefined.

Add a function which checks that two instructions in sequence are valid. For now this only cares when the first instruction is a MOVPRFX, but the function could be expanded to check for any pairs, should we need to.

Function is called after code generation in a loop which cycles through all IGs and all instructions.
This ensures the stream is valid after peephole type optimisations.

Alternatively, this function could be called from emitInsSanityCheck(). EMIT_BACKWARDS_NAVIGATION can then be used to get the previous instruction. However, defining this adds an extra variable to instrDesc, reducing the small immediate size from 7bits to 2bits.

Alternatively, this function could be inside every emitIns_ function, allowing emitLastIns to be used for the previous instruction. However, that is a lot of call sites and some could easily be missed. In addition a peephole optimisation could still occur after this point.

Currently requires #106125 for all SVE tests to pass.
AddSequentialAcross tests disabled until #106180 is fixed.

Resolves: #105514

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 9, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 9, 2024
@a74nh a74nh changed the title ARM64-SVE: Add checks for MOVPRFX sequencing ARM64-SVE: Add checks for MOVPRFX sequencing #105514 Aug 9, 2024
@a74nh a74nh marked this pull request as ready for review August 9, 2024 11:20
@a74nh
Copy link
Contributor Author

a74nh commented Aug 9, 2024

@dotnet/arm64-contrib @kunalspathak

@a74nh a74nh added the arm-sve Work related to arm64 SVE/SVE2 support label Aug 9, 2024
@a74nh a74nh changed the title ARM64-SVE: Add checks for MOVPRFX sequencing #105514 ARM64-SVE: Add checks for MOVPRFX sequencing 105514 Aug 9, 2024
@a74nh a74nh changed the title ARM64-SVE: Add checks for MOVPRFX sequencing 105514 ARM64-SVE: Add checks for MOVPRFX sequencing (105514) Aug 9, 2024
@a74nh a74nh changed the title ARM64-SVE: Add checks for MOVPRFX sequencing (105514) ARM64-SVE: Add checks for MOVPRFX sequencing (#105514) Aug 9, 2024
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested some changes. Also, can you also confirm all stress tests are passing?

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emit.cpp Show resolved Hide resolved

case IF_SVE_CC_2A: // <Zdn>.<T>, <V><m>
case IF_SVE_FU_2A: // <Zda>.<T>, <Zn>.<T>, #<const>
assert(secondId->idReg1() != secondId->idReg2());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was hoping to see lot more validation for registers of firstId as listed in #100743 (comment) or something similar:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was hoping to see lot more validation for registers of firstId

That should be all at the bottom of the function. I'll double check that I have everything, because the text you have above is a much more verbose version than what I was using.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you have most of the things covered. We need to add validation for:

  • MOVPRFX cannot be the last instruction in a sequence
  • The predicated instruction at PC+4 must use the merging predicate. (not sure if this is taken care by the format checks)
  • MOVPRFX destination register MUST be used as the destination register in the instruction at PC+4, "and is not allowed to be used in any other position other than destructive input"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • MOVPRFX cannot be the last instruction in a sequence

Done.

  • The predicated instruction at PC+4 must use the merging predicate. (not sure if this is taken care by the format checks)

Looking at all the formats, they are all /M or unpredicated. I've added some extra checking to make sure they correspond to the correct movprfx instruction.

  • MOVPRFX destination register MUST be used as the destination register in the instruction at PC+4, "and is not allowed to be used in any other position other than destructive input"

That's done in the switch. Eg:
assert(secondId->idReg1() != secondId->idReg3());
I had to do it there because you need the format to know what types the registers are.

@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 9, 2024
a74nh added 3 commits August 12, 2024 11:38
Change-Id: Ib78085c631ade5e923cddfda4def7efb362b4587
Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments on validation.

src/coreclr/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
assert(firstId->idReg2() == secondId->idReg2());

// "and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand),"
assert(firstId->idInsOpt() == secondId->idInsOpt());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be true for unpredicated version as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unpredicated movprfx doesn't care about element sizes, it is just MOVPRFX <Zd>, <Zn>.

@a74nh
Copy link
Contributor Author

a74nh commented Aug 13, 2024

Remove the changes to GenerateHWIntrinsicTests_Arm.cs.

This PR needs to wait for #106292 (and #106125) until it can be merged

Change-Id: I0a421eacf4895cfb2a30aaa6c483f4a7a3f17dd1
@a74nh
Copy link
Contributor Author

a74nh commented Aug 14, 2024

This PR needs to wait for #106292 (and #106125) until it can be merged

Those are merged now. There is nothing blocking this PR.

However - when running the stress testing I see some more errors from this PR. Will investigate.

@amanasifkhalid
Copy link
Member

However - when running the stress testing I see some more errors from this PR. Will investigate.

@a74nh I'm assuming you're getting a clean run now, right?

@a74nh
Copy link
Contributor Author

a74nh commented Aug 14, 2024

Normal testing passes with tiered and untiered. Need to check stress testing

However - when running the stress testing I see some more errors from this PR. Will investigate.

@a74nh I'm assuming you're getting a clean run now, right?

I'm not - but it's not due to this PR.

------------------- {'JitStress': '1'} -------------------
Test failed:
..........................................
..........................................
Sve.AbsoluteDifference<UInt64>(Vector<UInt64>, Vector<UInt64>): ConditionalSelectScenario_TrueValue failed:
    mask: (0, 0)
    left: (8023513538364913332, 10600524560448718829)
   right: (14956876948995427056, 7583671262694082551)
 falseOp: (8023513538364913332, 10600524560448718829)
  result: (0, 0)

I'll raise a bug and take a look at it - I have a reasonable idea of what it might be.

This PR shouldn't need to be blocked on it though.

src/coreclr/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emit.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/emitarm64sve.cpp Outdated Show resolved Hide resolved
@@ -2560,7 +2571,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)

HWIntrinsicImmOpHelper helper(this, intrin.op3, node);

for (helper.EmitBegin(); !helper.Done(); helper.EmitCaseEnd())
for (helper.EmitBegin(); !helper.Done(); helper.EmitCaseEnd(), (targetReg != op1Reg) ? 2 : 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this syntax doing? It looks like the result of the ternary operator isn't being used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change should have been on the line above - the HWIntrinsicImmOpHelper constructor needs to take in the number of instructions that will be generated in each stage of the loop.

@amanasifkhalid
Copy link
Member

I'll raise a bug and take a look at it - I have a reasonable idea of what it might be.
This PR shouldn't need to be blocked on it though.

Got it, thanks!

@a74nh
Copy link
Contributor Author

a74nh commented Aug 14, 2024

I'll raise a bug and take a look at it - I have a reasonable idea of what it might be.

This was due to changes in #106125. I've fixed in this PR because it's related and simple enough - that PR changed the type of movprfx emitted when it shouldn't have.

Also fixed up review comments.

Just checking now that all stress testing passes now.

Copy link
Member

@amanasifkhalid amanasifkhalid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if stress tests pass. Thanks!

@a74nh
Copy link
Contributor Author

a74nh commented Aug 14, 2024

LGTM if stress tests pass. Thanks!

Full stress testing for all APIs passing now.
ro: https://gist.github.com/a74nh/b690f44a2589873fce3066592d5141fa
and same for r too.

@amanasifkhalid amanasifkhalid dismissed kunalspathak’s stale review August 14, 2024 20:11

Change request is out-of-date

@amanasifkhalid amanasifkhalid merged commit 35a6730 into dotnet:main Aug 14, 2024
106 of 111 checks passed
@a74nh a74nh deleted the movprfxvalidate_github branch August 15, 2024 07:37
@github-actions github-actions bot locked and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arm64/Sve: Validation for movprfx in emitter
4 participants