ARM64 - Emitting `msub` instruction #66621

TIHan · 2022-03-14T23:40:05Z

Addresses the final piece for this issue: #34937
Though, this PR does not explicitly make changes to MOD or UMOD.

Description

Expression a - b * c can be optimized to a single instruction, msub, on ARM64.
It only optimizes that expression for integral types.

Acceptance Criteria

~~Add Tests~~(asmdiffs cover this, also ARM64 - Optimizing a % b operations part 2 #66407 includes tests)

Some ARM64 diffs

             cmp     w3, #0
             beq     G_M9825_IG04
             udiv    w4, w1, w3
-            mul     w3, w4, w3
-            sub     w3, w1, w3
+            msub    w3, w4, w3, w1
             str     w3, [x2]
             ldr     x2, [x19]
             ; byrRegs -[x2]
             bl      CORINFO_HELP_LDELEMA_REF
             ; gcrRegs -[x0]
             ; byrRegs +[x0]
-						;; bbWeight=1    PerfScore 42.50
+						;; bbWeight=1    PerfScore 42.00

-            mul     w1, w1, w2
-            sub     w0, w0, w1
-						;; bbWeight=1    PerfScore 2.50
+            msub    w0, w1, w2, w0
+						;; bbWeight=1    PerfScore 2.00

             sdiv    w2, w0, w1
-            mul     w1, w2, w1
-            sub     w0, w0, w1
-						;; bbWeight=1    PerfScore 13.50
+            msub    w0, w2, w1, w0
+						;; bbWeight=1    PerfScore 13.00

ghost · 2022-03-14T23:40:12Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Addresses the final piece for this issue: #34937

Description

Expression 'a - b * c' can be optimized to a single instruction, msub, on ARM64.
It only optimizes that expression for integral types.

Acceptance Criteria

Add Tests

Some ARM64 diffs

             cmp     w3, #0
             beq     G_M9825_IG04
             udiv    w4, w1, w3
-            mul     w3, w4, w3
-            sub     w3, w1, w3
+            msub    w3, w4, w3, w1
             str     w3, [x2]
             ldr     x2, [x19]
             ; byrRegs -[x2]
             bl      CORINFO_HELP_LDELEMA_REF
             ; gcrRegs -[x0]
             ; byrRegs +[x0]
-						;; bbWeight=1    PerfScore 42.50
+						;; bbWeight=1    PerfScore 42.00

-            mul     w1, w1, w2
-            sub     w0, w0, w1
-						;; bbWeight=1    PerfScore 2.50
+            msub    w0, w1, w2, w0
+						;; bbWeight=1    PerfScore 2.00

             sdiv    w2, w0, w1
-            mul     w1, w2, w1
-            sub     w0, w0, w1
-						;; bbWeight=1    PerfScore 13.50
+            msub    w0, w2, w1, w0
+						;; bbWeight=1    PerfScore 13.00

Author:	TIHan
Assignees:	TIHan
Labels:	`area-CodeGen-coreclr`
Milestone:	-

TIHan · 2022-03-14T23:57:22Z

@kunalspathak @jakobbotsch This is ready.

src/coreclr/jit/lowerarmarch.cpp

EgorBo · 2022-03-15T10:27:22Z

src/coreclr/jit/codegenarm64.cpp

+// Arguments:
+//     tree - GT_MSUB tree where op2 is GT_MUL
+//
+void CodeGen::genCodeForMsub(GenTreeOp* tree)


Have you considered to just handle it as part of GT_MADD? I bet it'd be much less lines of code

I did consider it - I guess it's a design choice whether or not to use GT_MADD or introduce GT_MSUB.

I'm actually conflicted on it. On the one hand, just using GT_MADD is sufficient, but on the other hand, the codegen for GT_MADD is a bit complicated since we are making GT_NEG nodes as contained.

The goal of introducing GT_MSUB and its code-gen was to make it easy to understand.

The name GT_MADD as a lowering-only op that is specific for ARM64 which reflects the actual instruction 'madd' - at least for me, I wouldn't it to expect to emit 'msub', but I totally understand why it does though.

Can we consider actually just making GT_MADD only emit 'madd' ? Then you could do the decision to use GT_MADD or GT_MSUB in lowering rather than in code-gen. It would make containment less complicated and code-gen simpler.

Can we consider actually just making GT_MADD only emit 'madd'

I like this and it makes it have "parity" with GT_ADD and GT_SUB.

…r that supports overflow

TIHan · 2022-04-11T17:33:16Z

@kunalspathak @EgorBo This is ready. CI is again failing due to unrelated reasons.

I know I'm adding GT_MSUB while GT_MADD can emit either madd or msub - but I'm willing to do a follow-up PR to make GT_MADD just emit madd and do the swapping and stuff in lowering to emit GT_MSUB so there is less confusion between the two.

kunalspathak · 2022-04-11T19:47:47Z

I know I'm adding GT_MSUB while GT_MADD can emit either madd or msub - but I'm willing to do a follow-up PR to make GT_MADD just emit madd and do the swapping and stuff in lowering to emit GT_MSUB so there is less confusion between the two.

sounds good to me.

kunalspathak

LGTM

TIHan · 2022-04-11T20:14:09Z

Made a follow-up issue: #67869

DrewScoggins · 2022-04-14T16:43:36Z

Windows-Arm64 Improvements: dotnet/perf-autofiling-issues#4624

dakersnar · 2022-04-21T20:43:40Z

More Windows-Arm64 Improvements: dotnet/perf-autofiling-issues#4733

AndyAyersMS · 2022-04-22T18:20:02Z

Ubuntu arm64 regression: dotnet/perf-autofiling-issues#4737

tannergooding · 2022-04-22T18:22:40Z

Would be interesting to see the codegen difference: https://github.com/dotnet/performance/blob/main/src/benchmarks/micro/runtime/Benchstones/BenchI/Pi.cs

I'd guess this is more likely due to some subtle loop alignment change from the smaller instruction sequence

Emitting MSUB for ARM64

09dc7e3

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 14, 2022

ghost assigned TIHan Mar 14, 2022

TIHan added 2 commits March 14, 2022 16:52

Update codegenarm64.cpp

0247254

Updated comments

ff8ee6f

Formatting

39d098c

kunalspathak reviewed Mar 15, 2022

View reviewed changes

src/coreclr/jit/lowerarmarch.cpp Outdated Show resolved Hide resolved

EgorBo reviewed Mar 15, 2022

View reviewed changes

TIHan added 3 commits March 15, 2022 14:01

Combining some logic for GT_MADD and GT_MSUB

f006f8e

Merge remote-tracking branch 'upstream/main' into arm64-msub

1acab50

Can only use gtOverflow after checking to see if it's a valid operato…

4655c4f

…r that supports overflow

kunalspathak approved these changes Apr 11, 2022

View reviewed changes

TIHan merged commit cfd1241 into dotnet:main Apr 12, 2022

tannergooding mentioned this pull request Apr 14, 2022

Review the multi-op instruction usage for Arm64 #68028

Open

28 tasks

This was referenced Apr 22, 2022

[Perf] Changes at 4/12/2022 9:40:49 PM dotnet/perf-autofiling-issues#4757

Closed

[Perf] Changes at 4/12/2022 9:40:49 PM dotnet/perf-autofiling-issues#4737

Closed

JulieLeeMSFT mentioned this pull request Apr 29, 2022

What's new in .NET 7 Preview 4 [WIP] dotnet/core#7378

Closed

ghost locked as resolved and limited conversation to collaborators May 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARM64 - Emitting `msub` instruction #66621

ARM64 - Emitting `msub` instruction #66621

TIHan commented Mar 14, 2022 •

edited

Loading

ghost commented Mar 14, 2022

TIHan commented Mar 14, 2022

EgorBo Mar 15, 2022

TIHan Mar 15, 2022

TIHan Mar 15, 2022

TIHan Mar 15, 2022

TIHan Mar 15, 2022

tannergooding Mar 16, 2022

TIHan commented Apr 11, 2022 •

edited

Loading

kunalspathak commented Apr 11, 2022

kunalspathak left a comment

TIHan commented Apr 11, 2022

DrewScoggins commented Apr 14, 2022

dakersnar commented Apr 21, 2022 •

edited

Loading

AndyAyersMS commented Apr 22, 2022

tannergooding commented Apr 22, 2022

ARM64 - Emitting msub instruction #66621

ARM64 - Emitting msub instruction #66621

Conversation

TIHan commented Mar 14, 2022 • edited Loading

ghost commented Mar 14, 2022

TIHan commented Mar 14, 2022

EgorBo Mar 15, 2022

Choose a reason for hiding this comment

TIHan Mar 15, 2022

Choose a reason for hiding this comment

TIHan Mar 15, 2022

Choose a reason for hiding this comment

TIHan Mar 15, 2022

Choose a reason for hiding this comment

TIHan Mar 15, 2022

Choose a reason for hiding this comment

tannergooding Mar 16, 2022

Choose a reason for hiding this comment

TIHan commented Apr 11, 2022 • edited Loading

kunalspathak commented Apr 11, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

TIHan commented Apr 11, 2022

DrewScoggins commented Apr 14, 2022

dakersnar commented Apr 21, 2022 • edited Loading

AndyAyersMS commented Apr 22, 2022

tannergooding commented Apr 22, 2022

ARM64 - Emitting `msub` instruction #66621

ARM64 - Emitting `msub` instruction #66621

TIHan commented Mar 14, 2022 •

edited

Loading

TIHan commented Apr 11, 2022 •

edited

Loading

dakersnar commented Apr 21, 2022 •

edited

Loading