Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm64: Readjust the heuristics to materialize double constants using fmov #64794

Closed
kunalspathak opened this issue Feb 4, 2022 · 4 comments
Closed
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Comments

@kunalspathak
Copy link
Member

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}
G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) ||
emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal))
{
costEx = 1;
costSz = 1;
}

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Feb 4, 2022
@ghost
Copy link

ghost commented Feb 4, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Currently, if a double constant is something that can be encoded using fmov, we always load it using fmov without trying to CSE it. For cases where the occurrence of a constant is in a loop, we load it in every iteration. Below is an amplified version of the problem to understand how we generate today

public static double issue1(int n ,double y)
{
    double result = 0.0;
    for (int i = 0; i < n; i++)
    {
        result += y;
        if (result == 4.5)
        {
            result -= 4.5;
        }
        else if (result < 4.5)
        {
            result += 4.5;
        }
        else if (result > 4.5)
        {
            result += (4.5 + y);
        }
    }
    return result;
}
G_M23855_IG01:              ;; offset=0000H
        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
                                                ;; bbWeight=1    PerfScore 1.50
G_M23855_IG02:              ;; offset=0008H
        4F00E410          movi    v16.16b, #0x00
        2A1F03E1          mov     w1, wzr
        7100001F          cmp     w0, #0
        540002CD          ble     G_M23855_IG08
                          align   [0 bytes for IG03]
                          align   [0 bytes]
                          align   [0 bytes]
                          align   [0 bytes]
                                                ;; bbWeight=1    PerfScore 2.50
G_M23855_IG03:              ;; offset=0018H
        1E602A10          fadd    d16, d16, d0
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000061          bne     G_M23855_IG05
                                                ;; bbWeight=4    PerfScore 22.00
G_M23855_IG04:              ;; offset=0028H
        4F00E410          movi    v16.16b, #0x00
        1400000D          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 3.00
G_M23855_IG05:              ;; offset=0030H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        54000082          bhs     G_M23855_IG06
        1E625011          fmov    d17, #4.5000
        1E712A10          fadd    d16, d16, d17
        14000007          b       G_M23855_IG07
                                                ;; bbWeight=2    PerfScore 14.00
G_M23855_IG06:              ;; offset=0048H
        1E625011          fmov    d17, #4.5000
        1E712200          fcmp    d16, d17
        5400008D          ble     G_M23855_IG07
        1E625011          fmov    d17, #4.5000
        1E712811          fadd    d17, d0, d17
        1E712A10          fadd    d16, d16, d17
                                                ;; bbWeight=2    PerfScore 18.00
G_M23855_IG07:              ;; offset=0060H
        11000421          add     w1, w1, #1
        6B00003F          cmp     w1, w0
        54FFFD8B          blt     G_M23855_IG03
                                                ;; bbWeight=4    PerfScore 8.00
G_M23855_IG08:              ;; offset=006CH
        0EB01E00          mov     v0.8b, v16.8b
                                                ;; bbWeight=1    PerfScore 0.50
G_M23855_IG09:              ;; offset=0070H
        A8C17BFD          ldp     fp, lr, [sp],#16
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

On x64, we do the right thing - https://godbolt.org/z/67h7rzWaK

We need to adjust the heuristic below to determine if it is profitable to use fmov or not. One criteria would be if it the usage is inside a hot block (compCurBB->bbWeight > BB_UNITY_WEIGHT), then do not do load it using fmov.

if ((*((__int64*)&(tree->AsDblCon()->gtDconVal)) == 0) ||
emitter::emitIns_valid_imm_for_fmov(tree->AsDblCon()->gtDconVal))
{
costEx = 1;
costSz = 1;
}

Perhaps, this can adversely affect the code because of increased register pressure, but we should validate if that is the case.

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr, untriaged

Milestone: -

@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

Ah, GH was down when I submitted the issue and seems I hit it multiple times, so it created it multiple times.

@kunalspathak
Copy link
Member Author

I forgot that I filed similar issue at #35257. Closing this one.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 6, 2022
@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Apr 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

3 participants