-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Sve.DuplicateSelectedScalarToVector() #103228
Add support for Sve.DuplicateSelectedScalarToVector() #103228
Conversation
Note regarding the
|
1 similar comment
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
@a74nh @kunalspathak @dotnet/arm64-contrib @arch-arm64-sve |
There are a few assertion failures, related to STR. Taking a further look. Stress test results
|
I have seen them in the past, but yes, if you can find out what is the cause, that will be good. |
All stress tests are passing now. Stress test results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added few things to check.
src/coreclr/jit/hwintrinsicarm64.cpp
Outdated
@@ -421,6 +421,10 @@ void HWIntrinsicInfo::lookupImmBounds( | |||
immUpperBound = (int)SVE_PATTERN_ALL; | |||
break; | |||
|
|||
case NI_Sve_DuplicateSelectedScalarToVector: | |||
immUpperBound = (512 / (genTypeSize(baseType) * BITS_PER_BYTE)) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
immUpperBound = (512 / (genTypeSize(baseType) * BITS_PER_BYTE)) - 1; | |
immUpperBound = Compiler::getSIMDVectorLength(simdSize, baseType) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not able able to use the getSIMDVectorLength()
here as the imm
for DUP
seems special [1].
Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in "imm2:tsz".
I didn't find a better helper method for this so did it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually little confused with that description and then this one:
The immediate element index is in the range of 0 to 63 (bytes), 31 (halfwords), 15 (words), 7 (doublewords) or 3 (quadwords).
With ^ description, it sounded me like getSIMDVectorLength()
, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit confusing. Unlike other cases, the index here does not depend on the vector length. A valid index range is fixed based on the element type, e.g., 0 to 63 for vectors of type byte.
ValidateResult(_dataTable.inArrayPtr, {Imm}, _dataTable.outArrayPtr); | ||
} | ||
|
||
public void RunBasicScenario_UnsafeRead_InvalidImm() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you confirm what is the behavior here?
From https://docsmirror.github.io/A64/2023-06/dup_z_zi.html, is it validating this portion:
Selecting an element beyond the accessible vector length causes the destination vector to be set to zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid index refers to a value that is permitted but out of bounds on the current system. E.g., bytes permitted index values are between 0 to 63 but on a 256bit SVE length machine, valid indices can be between 256 / 8 - 1 = 0 to 31. Thus, the indices between 32 to 63 would set the destination value to zero.
src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Outdated
Show resolved
Hide resolved
src/tests/Common/GenerateHWIntrinsicTests/GenerateHWIntrinsicTests_Arm.cs
Outdated
Show resolved
Hide resolved
The tests with short are failing with the following assertion error. Not sure if it's related to the patch.
|
JIT dump for a failing method. |
@SwapnilGaikwad - Are you sure this one is the right dump? I see the compilation succeed in this dump. |
This is to do with that a node is of type |
Hi Kunal, apologies. It was dump of the parent test method that emitted a call to |
I think the problem is in |
@SwapnilGaikwad - I assume you don't need anything from me for this PR? |
Hi @kunalspathak , I haven't exhausted all the debugging possibilities yet. I might come back with questions 🙂 . |
Hi @kunalspathak , your suggestion was very useful to identify the issue. The mask was hardcoded to simd registers for op2. All stress tests pass
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Have a question about the change in codegenarm64.cpp
.
Also, how are we exposing DUP (immediate) and DUP (scalar) ?
@@ -8150,9 +8153,11 @@ void emitter::emitIns_S_R(instruction ins, emitAttr attr, regNumber reg1, int va | |||
{ | |||
useRegForImm = true; | |||
regNumber rsvdReg = codeGen->rsGetRsvdReg(); | |||
codeGen->instGen_Set_Reg_To_Imm(EA_PTRSIZE, rsvdReg, imm); | |||
// For larger imm values (> 9 bits), calculate base + imm in a reserved register first. | |||
codeGen->instGen_Set_Reg_To_Base_Plus_Imm(EA_PTRSIZE, rsvdReg, reg2, imm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change related to the DUP
? Don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might actually need it for https://github.com/dotnet/runtime/pull/104065/files#diff-2b2c8b9011607926410624d6f81613fad7b74c6e0516d578675a8b792998fe4fR7893-R7896, but I am curious if you found a repro, for which you had to add change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to DUP directly. However, tests trigger a scenario where we end up in having a store with a larger immediate.
Spoke offline that we will use these in future when we intrinsify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Contribute towards #99957.