Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64/Sve: Predicated Abs, Predicated/UnPredicated Add, Conditional Select #100743

Merged
merged 52 commits into from
Apr 24, 2024

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Apr 7, 2024

Based on the feedback from @tannergooding in #100134 (comment), I reworked #100134 a little bit and also included the ConditionalSelect I implemented in #100718.

The change in design is that we do not touch the HWIntrinsic node until lowerer. In lowerer, we wrap the node in ConditionalSelect(mask, original_operation, falseVal) and contain the original operation. This lets us easily determine which API should map to the predicate vs. unpredicated version of the instruction.

  • For APIs like Abs which only has predicated version, in lowerer, we wrap it in ConditionalSelect(ptrueAll, Abs, zero). During containment analysis, for ConditionalSelect, we check if 2nd operand is scalable SVE instrinsic and if yes, we mark it as contained. During codegen, the ConditionalSelect that has op2 as contained needs to go through the "predicated" version of the instruction.
  • For APIs like Add that has both predicated and unpredicated versions, they are handled differently. For unpredicated Add, it does not get marked as contained and in codegen, generates unpredicated version. In order to generate predicated version of add, user has to write a code such that Add is wrap inside a conditional, e.g. ConditionalSelect(mask, Add(x,y), b). If yes, then it follows same path as Abs and generates predicate version of the instruction.

There are still some handling needs to be done for RMW, but want to get feedback on the design before I move forward. The sample test case along with the output can be found in https://gist.github.com/kunalspathak/bc4e917ced68bef793d11fcbd050162c
https://gist.github.com/kunalspathak/1fb0b17f0908ba26e46f0cd146ab05b8 .

I still need to handle cases where user writes ConditionalSelect(mask, Abs(x), y). Currently we end of expanding Abs with another ConditionalSelect.

Thanks @tannergooding for the design discussion and helping me understand the concepts.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI new-api-needs-documentation labels Apr 7, 2024
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@kunalspathak
Copy link
Member Author

@tannergooding - can you take another look? I have added bunch of scenarios to use `ConditionalSelect() on unary/binary operations including:

  • Pass op1 as falseValue in ConditionalSelect() done on top of the API
  • Pass op2 as falseValue in ConditionalSelect() done on top of the API
  • Pass Zero as falseValue in ConditionalSelect() done on top of the API
  • Pass Zero as maskValue in ConditionalSelect() done on top of the API
  • Pass AllBitSet as maskValue in ConditionalSelect() done on top of the API
  • Make sure maskValue has mix of 0s as well.

I also tested the newly added test cases using https://github.com/a74nh/runtime/blob/api_github/sve_api/stress_tester.py and they all pass.

{
assert(numArgs > 0);
GenTree* op1 = retNode->AsHWIntrinsic()->Op(1);
if (intrinsic == NI_Sve_ConditionalSelect)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important for this PR, but this is potentially something that should be handled in gtNewSimdCndSelNode instead and then more generally as part of morph to capture values that don't materialize as constants until later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do it in follow-up PR.

@a74nh
Copy link
Contributor

a74nh commented Apr 25, 2024

I'm getting a failure with the abs and add tests:

❯ $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_Abs_int
14:07:31.270 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Abs_int()
Supported ISAs:
  AdvSimd:   True
  Aes:       True
  ArmBase:   True
  Crc32:     True
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True


Assert failure(PID 1134821 [0x001150e5], Thread: 1134821 [0x1150e5]): Assertion failed 'isVectorRegister(reg2)' in 'JIT.HardwareIntrinsics.Arm._Sve.SimpleUnaryOpTest__Sve_Abs_int:RunBasicScenario_UnsafeRead():this' during 'Generate code' (IL size 83; hash 0x0487a6e9; Tier0)

    File: /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/emitarm64sve.cpp:2248
    Image: /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun

[1]    1134821 abort (core dumped)  $CORE_ROOT/corerun  Sve_Abs_int

Backtrace:

#0  DBG_DebugBreak () at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/pal/src/arch/arm64/debugbreak.S:7
#1  0x0000ffffed1029d0 in DebugBreak () at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/pal/src/debug/debug.cpp:407
#2  0x0000ffffecdf8440 in assertAbort (why=0xffffecc8c644 "isVectorRegister(reg2)",
    file=0xffffecc66283 "/home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/emitarm64sve.cpp", line=2248)
    at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/error.cpp:288
#3  0x0000ffffed099178 in emitter::emitInsSve_R_R (this=0xaaaaab99a400, ins=INS_sve_movprfx, attr=EA_SCALABLE, reg1=REG_V16, reg2=REG_COUNT,
    opt=INS_OPTS_NONE, sopt=INS_SCALABLE_OPTS_NONE) at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/emitarm64sve.cpp:2248
#4  0x0000ffffed0e8bf0 in CodeGen::genHWIntrinsic (this=0xaaaaab999b08, node=<optimized out>)
    at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/hwintrinsiccodegenarm64.cpp:444
#5  0x0000ffffecdc3440 in CodeGen::genCodeForBBlist (this=0xaaaaab999b08)
    at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/codegenlinear.cpp:472
#6  0x0000ffffecdb6aa0 in CodeGen::genGenerateMachineCode (this=0xaaaaab999b08)
    at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/codegencommon.cpp:1877
#7  0x0000ffffecdc1a70 in CodeGenPhase::DoPhase (this=<optimized out>)
    at /home/alahay01/dotnet/runtime_sve_api/src/coreclr/jit/codegen.h:1703

Looks like reg2 is invalid in frame 3. reg2=REG_COUNT

@a74nh
Copy link
Contributor

a74nh commented Apr 25, 2024

Also some (but not all) of the conditional tests are failing:

❯ $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_ConditionalSelect_sbyte
14:12:14.806 Running test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_ConditionalSelect_sbyte()
Supported ISAs:
  AdvSimd:   True
  Aes:       True
  ArmBase:   True
  Crc32:     True
  Dp:        True
  Rdm:       True
  Sha1:      True
  Sha256:    True
  Sve:       True

Beginning scenario: RunBasicScenario_UnsafeRead
Sve.ConditionalSelect<SByte>(Vector<SByte>, Vector<SByte>, Vector<SByte>): RunBasicScenario_UnsafeRead failed:
 firstOp: (29, 69, 117, 63, 117, 109, 74, 55, 16, 108, 36, 47, 71, 38, 19, 120)
secondOp: (56, 113, 53, 30, 67, 23, 97, 124, 31, 20, 23, 120, 105, 44, 50, 103)
 thirdOp: (44, 17, 40, 89, 16, 50, 66, 38, 4, 12, 56, 112, 105, 17, 76, 72)
  result: (-1, -1, 0, 0, 67, 23, 97, 124, 31, 20, 23, 120, 105, 44, 50, 103)

Beginning scenario: RunBasicScenario_Load
Sve.ConditionalSelect<SByte>(Vector<SByte>, Vector<SByte>, Vector<SByte>): RunBasicScenario_Load failed:
 firstOp: (29, 69, 117, 63, 117, 109, 74, 55, 16, 108, 36, 47, 71, 38, 19, 120)
secondOp: (56, 113, 53, 30, 67, 23, 97, 124, 31, 20, 23, 120, 105, 44, 50, 103)
 thirdOp: (44, 17, 40, 89, 16, 50, 66, 38, 4, 12, 56, 112, 105, 17, 76, 72)
  result: (-1, -1, 0, 0, 67, 23, 97, 124, 31, 20, 23, 120, 105, 44, 50, 103)

Beginning scenario: RunReflectionScenario_UnsafeRead
Beginning scenario: RunLclVarScenario_UnsafeRead
Beginning scenario: RunClassFldScenario
Beginning scenario: RunStructLclFldScenario
Beginning scenario: RunStructFldScenario
System.Exception: One or more scenarios did not complete as expected.
   at JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_ConditionalSelect_sbyte() in /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/Arm/Sve/Sve_ro/Sve_ro/gen/Sve.ConditionalSelect.sbyte.cs:line 62
   at Program.<<Main>$>g__TestExecutor2723|0_2724(StreamWriter tempLogSw, StreamWriter statsCsvSw, <>c__DisplayClass0_0&) in /home/alahay01/dotnet/runtime_sve_api/artifacts/tests/coreclr/obj/linux.arm64.Checked/Managed/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/generated/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/FullRunner.g.cs:line 68199
14:12:15.143 Failed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_ConditionalSelect_sbyte()

@kunalspathak
Copy link
Member Author

Thanks @a74nh - i will take a look.

@kunalspathak
Copy link
Member Author

@a74nh - The Abs failures are known and are fixed in #101515. With that PR and main, all the tests are passing for me as seen in https://gist.github.com/kunalspathak/73e0cfd469a70fd4b5466bada45df2eb. So perhaps, it could be 256-bit machine difference? Can you try restricting it to 128-bit and check if they still fail for you?

@a74nh
Copy link
Contributor

a74nh commented Apr 26, 2024

So perhaps, it could be 256-bit machine difference? Can you try restricting it to 128-bit and check if they still fail for you?

Yes, it looks like the conditionalselect failures are 256bit issues. Works on 256bit machine when I restrict the vector length to 128bit.

matouskozak pushed a commit to matouskozak/runtime that referenced this pull request Apr 30, 2024
…elect (dotnet#100743)

* JIT ARM64-SVE: Add Sve.Abs() and Sve.Add()

Change-Id: Ie8cfe828595da9a87adbc0857c0c44c0ce12f5b2

* Fix sve scaling in enitIns_R_S/S_R

* Revert "Fix sve scaling in enitIns_R_S/S_R"

This reverts commit e9fa735.

* Fix sve scaling in enitIns_R_S/S_R

* Restore testing

* Use NaturalScale_helper for vector load/stores

* wip

* Add ConditionalSelect() APIs

* Handle ConditionalSelect in JIT

* Add test coverage

* Update the test cases

* jit format

* fix merge conflicts

* Make predicated/unpredicated work with ConditionalSelect

Still some handling around RMW is needed, but this basically works

* Misc. changes

* jit format

* jit format

* Handle all the conditions correctly

* jit format

* fix some spacing

* Removed the assert

* fix the largest vector size to 64 to fix dotnet#100366

* review feedback

* wip

* Add SVE feature detection for Windows

* fix the check for invalid alignment

* Revert "Add SVE feature detection for Windows"

This reverts commit ed7c781.

* Handle case where Abs() is wrapped in another conditionalSelect

* jit format

* fix the size comparison

* HW_Flag_MaskedPredicatedOnlyOperation

* Revert the change in emitarm64.cpp around INS_sve_ldr_mask/INS_sve_str_mask

* Fix the condition for lowering

* address review feedback for movprfx

* Move the special handling of Vector<>.Zero from lowerer to importer

* Rename IsEmbeddedMaskedOperation/IsOptionalEmbeddedMaskedOperation

* Add more test coverage for conditionalSelect

* Rename test method name

* Add more test coverage for conditionalSelect:Abs

* jit format

* Add logging on test methods

* Add the missing movprfx for abs

* Add few more scenarios where falseVal is zero

* Make sure LoadVector is marked as explicit needing mask

* revisit the codegen logic

* Remove commented code and add some other comments

* jit format

---------

Co-authored-by: Alan Hayward <alan.hayward@arm.com>
michaelgsharp pushed a commit to michaelgsharp/runtime that referenced this pull request May 9, 2024
…elect (dotnet#100743)

* JIT ARM64-SVE: Add Sve.Abs() and Sve.Add()

Change-Id: Ie8cfe828595da9a87adbc0857c0c44c0ce12f5b2

* Fix sve scaling in enitIns_R_S/S_R

* Revert "Fix sve scaling in enitIns_R_S/S_R"

This reverts commit e9fa735.

* Fix sve scaling in enitIns_R_S/S_R

* Restore testing

* Use NaturalScale_helper for vector load/stores

* wip

* Add ConditionalSelect() APIs

* Handle ConditionalSelect in JIT

* Add test coverage

* Update the test cases

* jit format

* fix merge conflicts

* Make predicated/unpredicated work with ConditionalSelect

Still some handling around RMW is needed, but this basically works

* Misc. changes

* jit format

* jit format

* Handle all the conditions correctly

* jit format

* fix some spacing

* Removed the assert

* fix the largest vector size to 64 to fix dotnet#100366

* review feedback

* wip

* Add SVE feature detection for Windows

* fix the check for invalid alignment

* Revert "Add SVE feature detection for Windows"

This reverts commit ed7c781.

* Handle case where Abs() is wrapped in another conditionalSelect

* jit format

* fix the size comparison

* HW_Flag_MaskedPredicatedOnlyOperation

* Revert the change in emitarm64.cpp around INS_sve_ldr_mask/INS_sve_str_mask

* Fix the condition for lowering

* address review feedback for movprfx

* Move the special handling of Vector<>.Zero from lowerer to importer

* Rename IsEmbeddedMaskedOperation/IsOptionalEmbeddedMaskedOperation

* Add more test coverage for conditionalSelect

* Rename test method name

* Add more test coverage for conditionalSelect:Abs

* jit format

* Add logging on test methods

* Add the missing movprfx for abs

* Add few more scenarios where falseVal is zero

* Make sure LoadVector is marked as explicit needing mask

* revisit the codegen logic

* Remove commented code and add some other comments

* jit format

---------

Co-authored-by: Alan Hayward <alan.hayward@arm.com>
@github-actions github-actions bot locked and limited conversation to collaborators May 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants