-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: ARM64 - Added SVE APIs ExtractLastVector
, ExtractLastScalar
, ExtractAfterLastVector
, ExtractAfterLastScalar
#103847
Conversation
Note regarding the
|
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
@dotnet/arm64-contrib @kunalspathak this is ready. Example codegen:
G_M47661_IG01: ; func=00, offs=0x000000, size=0x0010, bbWeight=1, PerfScore 3.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
IN0006: 000000 stp fp, lr, [sp, #-0x20]!
IN0007: 000004 mov fp, sp
IN0008: 000008 str xzr, [fp, #0x10] // [V00 loc0]
IN0009: 00000C str xzr, [fp, #0x18] // [V00 loc0+0x08]
G_M47661_IG02: ; offs=0x000010, size=0x0014, bbWeight=1, PerfScore 9.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref
IN0001: 000010 movi v16.4s, #0
IN0002: 000014 str q16, [fp, #0x10] // [V00 loc0]
IN0003: 000018 ldr q16, [fp, #0x10] // [V00 loc0]
IN0004: 00001C ptrue p0.s
IN0005: 000020 lasta w0, p0, z16.s
G_M47661_IG03: ; offs=0x000024, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend
IN000a: 000024 ldp fp, lr, [sp], #0x20
IN000b: 000028 ret lr
IN0006: 000000 stp fp, lr, [sp, #-0x20]!
IN0007: 000004 mov fp, sp
IN0008: 000008 str xzr, [fp, #0x10] // [V00 loc0]
IN0009: 00000C str xzr, [fp, #0x18] // [V00 loc0+0x08]
G_M12491_IG02: ; offs=0x000010, size=0x0014, bbWeight=1, PerfScore 7.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref
IN0001: 000010 movi v16.4s, #0
IN0002: 000014 str q16, [fp, #0x10] // [V00 loc0]
IN0003: 000018 ldr q16, [fp, #0x10] // [V00 loc0]
IN0004: 00001C ptrue p0.d
IN0005: 000020 lasta z0, p0, z16.d
G_M12491_IG03: ; offs=0x000024, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend
IN000a: 000024 ldp fp, lr, [sp], #0x20
IN000b: 000028 ret lr
G_M39689_IG01: ; func=00, offs=0x000000, size=0x0010, bbWeight=1, PerfScore 3.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
IN0006: 000000 stp fp, lr, [sp, #-0x20]!
IN0007: 000004 mov fp, sp
IN0008: 000008 str xzr, [fp, #0x10] // [V00 loc0]
IN0009: 00000C str xzr, [fp, #0x18] // [V00 loc0+0x08]
G_M39689_IG02: ; offs=0x000010, size=0x0014, bbWeight=1, PerfScore 7.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, BB02 [0001], byref
IN0001: 000010 movi v16.4s, #0
IN0002: 000014 str q16, [fp, #0x10] // [V00 loc0]
IN0003: 000018 ldr q16, [fp, #0x10] // [V00 loc0]
IN0004: 00001C ptrue p0.s
IN0005: 000020 lastb z0, p0, z16.s
G_M39689_IG03: ; offs=0x000024, size=0x0008, bbWeight=1, PerfScore 2.00, epilog, nogc, extend
IN000a: 000024 ldp fp, lr, [sp], #0x20
IN000b: 000028 ret lr Stress tests all passed:
|
@kunalspathak this is ready again. Hope this is a reasonable compromise. I didn't create a flag, but I created a function. There isn't any room for an additional flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add ConditionalSelect tests for Extract*Vector
API.
src/tests/JIT/HardwareIntrinsics/Arm/Shared/SveExtractTest.template
Outdated
Show resolved
Hide resolved
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs
Show resolved
Hide resolved
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs
Outdated
Show resolved
Hide resolved
src/tests/JIT/HardwareIntrinsics/Arm/Shared/SveExtractTest.template
Outdated
Show resolved
Hide resolved
@kunalspathak this is almost done, but I'm having issues with the ConditionalSelectScenarios when TieredCompilation=0 |
are you still blocked on this or is the PR working as expected? |
I'm making progress, basically we can't remove the |
@kunalspathak this is ready, apart from some ConditionalSelect issues that we are trying to address. |
/// LASTA Wresult, Pg, Zop.B | ||
/// LASTA Bresult, Pg, Zop.B | ||
/// </summary> | ||
public static unsafe byte ExtractAfterLastScalar(Vector<byte> value) => ExtractAfterLastScalar(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that the managed signature of these APIs isn't correct (and wasn't caught in API review).
These all behave specially and need to take a mask explicitly, so should've been in a general shape that looked more like:
public static T ExtractAfterLastActiveToScalar(Vector<T> mask, Vector<T> value)
public static Vector<T> ExtractAfterLastActiveToVector(Vector<T> mask, Vector<T> value)
These are shipping as experimental in .NET 9, so we can go ahead and fix it now and follow up in API review later, but we should log an issue tracking that.
The general consideration is that LASTA
and LASTB
take a destination register
, mask register
, and input register
. The mask is used to determine which input element is being extracted, so for LASTA
(extract after last) it finds the index of the last element in mask
that is "active" (all bits set) and extracts index+1
from value
. In the case that the mask is zero
or allbitsset
, it uses index 0. While for LASTB
(extract last) it finds the index of the last element in mask
that is "active" (all bits set) and extracts that index
from value
. In the case that mask is zero
or allbitsset
, it uses the last index.
So, the signatures must explicitly take a mask and they shouldn't participate in the general EmbeddedMaskOperation
containment logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- The names I gave above maybe aren't the best.
The concepts involved is we have what is logically similar to:
T Extract(Vector<T> vector, int index)
Vector<T> ExtractVector(Vector<T> vector, int index)
There's then a modifier on top which in other APIs is exposed as LastActiveElement
and AfterLastActiveElement
So the natural way to combine them would maybe instead be:
T ExtractAfterLastActiveElement(Vector<T> mask, Vector<T> vector);
Vector<T> ExtractVectorAfterLastActiveElement(Vector<T> mask, Vector<T> vector);
T ExtractLastActiveElement(Vector<T> mask, Vector<T> vector);
Vector<T> ExtractVectorLastActiveElement(Vector<T> mask, Vector<T> vector);
// or possibly
T ExtractFromLastActiveElement(Vector<T> mask, Vector<T> vector);
Vector<T> ExtractVectorFromLastActiveElement(Vector<T> mask, Vector<T> vector);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super happy with ExtractVector*
though since it's not actually extracting a vector
It's extracting a scalar and simply storing it in element 0 of a destination vector register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give that the vector form always zero extends and there isn't a great name for it, we could support it by recognizing CreateScalar(ExtractFromLastActiveElement(mask, vector))
That gives us a single understandable name and lets us generate the vector form where its appropriate still.
We are skipping these APIs for now. |
@TIHan - can we close this, since we will have to rewrite part of it anyway? |
Contributes to #99957
Adds:
ExtractLastVector
ExtractLastScalar
ExtractAfterLastVector
ExtractAfterLastScalar