Arm64: Readjust the heuristics to materialize double constants using fmov #64794

kunalspathak · 2022-02-04T06:28:20Z

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}

G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

runtime/src/coreclr/jit/gentree.cpp

Lines 3489 to 3494 in 7065279

    
           if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) || 
        
               emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal)) 
        
           { 
        
               costEx = 1; 
        
               costSz = 1; 
        
           }

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

The text was updated successfully, but these errors were encountered:

ghost · 2022-02-04T06:28:34Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}

G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

runtime/src/coreclr/jit/gentree.cpp

Lines 3489 to 3494 in 7065279

    
           if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) || 
        
               emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal)) 
        
           { 
        
               costEx = 1; 
        
               costSz = 1; 
        
           }

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

Author:	kunalspathak
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `untriaged`
Milestone:	-

kunalspathak · 2022-02-04T06:28:50Z

@dotnet/jit-contrib

kunalspathak · 2022-02-04T06:38:31Z

Ah, GH was down when I submitted the issue and seems I hit it multiple times, so it created it multiple times.

kunalspathak · 2022-02-04T18:00:57Z

I forgot that I filed similar issue at #35257. Closing this one.

dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 4, 2022

kunalspathak assigned TIHan Feb 4, 2022

kunalspathak closed this as completed Feb 4, 2022

kunalspathak reopened this Feb 4, 2022

kunalspathak closed this as completed Feb 4, 2022

TIHan mentioned this issue Feb 11, 2022

ARM64 - Allow double constants to CSE more often #65176

Closed

1 task

ghost locked as resolved and limited conversation to collaborators Mar 6, 2022

JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64: Readjust the heuristics to materialize double constants using fmov #64794

Arm64: Readjust the heuristics to materialize double constants using fmov #64794

kunalspathak commented Feb 4, 2022

ghost commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

Arm64: Readjust the heuristics to materialize double constants using fmov #64794

Arm64: Readjust the heuristics to materialize double constants using fmov #64794

Comments

kunalspathak commented Feb 4, 2022

ghost commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

kunalspathak commented Feb 4, 2022

kunalspathak commented Feb 4, 2022