Arm64: Readjust the heuristics to materialize double constants using fmov #64791

kunalspathak · 2022-02-04T05:59:26Z

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}

G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

runtime/src/coreclr/jit/gentree.cpp

Lines 3489 to 3494 in 7065279

    
           if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) || 
        
               emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal)) 
        
           { 
        
               costEx = 1; 
        
               costSz = 1; 
        
           }

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

The text was updated successfully, but these errors were encountered:

jkotas · 2022-02-04T06:38:01Z

Dup of #64794

jkotas closed this as completed Feb 4, 2022

ghost locked as resolved and limited conversation to collaborators Apr 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arm64: Readjust the heuristics to materialize double constants using fmov #64791

Arm64: Readjust the heuristics to materialize double constants using fmov #64791

kunalspathak commented Feb 4, 2022

jkotas commented Feb 4, 2022

Arm64: Readjust the heuristics to materialize double constants using fmov #64791

Arm64: Readjust the heuristics to materialize double constants using fmov #64791

Comments

kunalspathak commented Feb 4, 2022

jkotas commented Feb 4, 2022