Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Readjust the heuristics to materialize double constants using fmov #64791

Closed
kunalspathak opened this issue Feb 4, 2022 · 0 comments
Closed

Comments

@kunalspathak
Copy link
Member

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}
G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) ||
emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal))
{
costEx = 1;
costSz = 1;
}

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

@jkotas
Copy link
Member

jkotas commented Feb 4, 2022

Dup of #64794

@jkotas jkotas closed this as completed Feb 4, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants