Loop condition vs initial guard check impacts bounds checking #83349

stephentoub · 2023-03-13T19:19:57Z

Consider these two functionally-identical loops:

    public static int M1(int i, ReadOnlySpan<char> src)
    {    
        int sum = 0;
        while ((uint)i < (uint)src.Length)
        {
            sum += src[i++];
        }
        return sum;
    }
    
    public static int M2(int i, ReadOnlySpan<char> src)
    {    
        int sum = 0;
        while (true)
        {
            if ((uint)i >= (uint)src.Length) break;

            sum += src[i++];
        }
        return sum;
    }

The second simply moves the loop condition to be the very first thing in the body. SharpLab even decompiles them to C# identically. However, the former has bounds checks whereas they're appropriately removed for the latter:
SharpLab

C.M1(Int32, System.ReadOnlySpan`1<Char>)
    L0000: sub rsp, 0x28
    L0004: mov rax, [rdx]
    L0007: mov edx, [rdx+8]
    L000a: xor r8d, r8d
    L000d: cmp ecx, edx
    L000f: jae short L0039
    L0011: nop [rax]
    L0018: nop [rax+rax]
    L0020: lea r9d, [rcx+1]
    L0024: cmp ecx, edx
    L0026: jae short L0041
    L0028: mov ecx, ecx
    L002a: movzx ecx, word ptr [rax+rcx*2]
    L002e: add r8d, ecx
    L0031: cmp r9d, edx
    L0034: mov ecx, r9d
    L0037: jb short L0020
    L0039: mov eax, r8d
    L003c: add rsp, 0x28
    L0040: ret
    L0041: call 0x00007fff38258b30
    L0046: int3

C.M2(Int32, System.ReadOnlySpan`1<Char>)
    L0000: mov rax, [rdx]
    L0003: mov edx, [rdx+8]
    L0006: xor r8d, r8d
    L0009: cmp ecx, edx
    L000b: jae short L001f
    L000d: lea r9d, [rcx+1]
    L0011: mov ecx, ecx
    L0013: movzx ecx, word ptr [rax+rcx*2]
    L0017: add r8d, ecx
    L001a: mov ecx, r9d
    L001d: jmp short L0009
    L001f: mov eax, r8d
    L0022: ret

ghost · 2023-03-13T19:20:14Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Consider these two functionally-identical loops:

    public static int M1(int i, ReadOnlySpan<char> src)
    {    
        int sum = 0;
        while ((uint)i < (uint)src.Length)
        {
            sum += src[i++];
        }
        return sum;
    }
    
    public static int M2(int i, ReadOnlySpan<char> src)
    {    
        int sum = 0;
        while (true)
        {
            if ((uint)i >= (uint)src.Length) break;

            sum += src[i++];
        }
        return sum;
    }

The second simply moves the loop condition to be the very first thing in the body. SharpLab even decompiles them to C# identically. However, the former has bounds checks whereas they're appropriately removed for the latter:
SharpLab

C.M1(Int32, System.ReadOnlySpan`1<Char>)
    L0000: sub rsp, 0x28
    L0004: mov rax, [rdx]
    L0007: mov edx, [rdx+8]
    L000a: xor r8d, r8d
    L000d: cmp ecx, edx
    L000f: jae short L0039
    L0011: nop [rax]
    L0018: nop [rax+rax]
    L0020: lea r9d, [rcx+1]
    L0024: cmp ecx, edx
    L0026: jae short L0041
    L0028: mov ecx, ecx
    L002a: movzx ecx, word ptr [rax+rcx*2]
    L002e: add r8d, ecx
    L0031: cmp r9d, edx
    L0034: mov ecx, r9d
    L0037: jb short L0020
    L0039: mov eax, r8d
    L003c: add rsp, 0x28
    L0040: ret
    L0041: call 0x00007fff38258b30
    L0046: int3

C.M2(Int32, System.ReadOnlySpan`1<Char>)
    L0000: mov rax, [rdx]
    L0003: mov edx, [rdx+8]
    L0006: xor r8d, r8d
    L0009: cmp ecx, edx
    L000b: jae short L001f
    L000d: lea r9d, [rcx+1]
    L0011: mov ecx, ecx
    L0013: movzx ecx, word ptr [rax+rcx*2]
    L0017: add r8d, ecx
    L001a: mov ecx, r9d
    L001d: jmp short L0009
    L001f: mov eax, r8d
    L0022: ret

Author:	stephentoub
Assignees:	-
Labels:	`tenet-performance`, `area-CodeGen-coreclr`
Milestone:	8.0.0

EgorBo · 2023-03-13T19:38:48Z

public static int M1(int i, ReadOnlySpan<char> src)
{
    int sum = 0;
    while (i >= 0 && i < src.Length)
    {
        sum += src[i++];
    }
    return sum;
}

this works 🙂

stephentoub · 2023-04-03T14:31:08Z

this works 🙂

Sort of... it eliminates the comparison as part of the bounds check, but it still has an extra comparison/branch for the i >= 0.

C.M1(Int32, System.ReadOnlySpan`1<Char>)
    L0000: mov rax, [rdx]
    L0003: mov edx, [rdx+8]
    L0006: xor r8d, r8d
    L0009: jmp short L001b
    L000b: lea r9d, [rcx+1]
    L000f: mov ecx, ecx
    L0011: movzx ecx, word ptr [rax+rcx*2]
    L0015: add r8d, ecx
    L0018: mov ecx, r9d
    L001b: test ecx, ecx
    L001d: jl short L0023
    L001f: cmp ecx, edx
    L0021: jl short L000b
    L0023: mov eax, r8d
    L0026: ret

In contrast, this:

public static int M2(int i, ReadOnlySpan<char> src)
{
    int sum = 0;
    while (true)
    {
        if ((uint)i >= (uint)src.Length) break;
        sum += src[i++];
    }
    return sum;
}

is:

C.M2(Int32, System.ReadOnlySpan`1<Char>)
    L0000: mov rax, [rdx]
    L0003: mov edx, [rdx+8]
    L0006: xor r8d, r8d
    L0009: cmp ecx, edx
    L000b: jae short L001f
    L000d: lea r9d, [rcx+1]
    L0011: mov ecx, ecx
    L0013: movzx ecx, word ptr [rax+rcx*2]
    L0017: add r8d, ecx
    L001a: mov ecx, r9d
    L001d: jmp short L0009
    L001f: mov eax, r8d
    L0022: ret

EgorBo · 2023-08-05T11:07:27Z

The fix doesn't look to be trivial so moving to 9.0 🙁

jakobbotsch · 2024-04-11T14:09:53Z

Going to grab this one since #100777 should fix it.

…and make overflow check more precise (dotnet#100777) Fix dotnet#9422 Fix dotnet#83349

stephentoub added tenet-performance Performance related issue area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Mar 13, 2023

stephentoub added this to the 8.0.0 milestone Mar 13, 2023

stephentoub mentioned this issue Mar 13, 2023

Remove cctor from BitConverter #83196

Merged

EgorBo self-assigned this Mar 13, 2023

MihaZupan mentioned this issue Mar 27, 2023

Enable regex source gen / compiler vectorization of all sets #83992

Merged

stephentoub mentioned this issue Apr 3, 2023

Introduction to vectorization with Vector128 and Vector256 #84115

Merged

MihaZupan mentioned this issue Jun 30, 2023

Use SearchValues in PathString dotnet/aspnetcore#49117

Merged

EgorBo modified the milestones: 8.0.0, 9.0.0 Aug 5, 2023

jakobbotsch assigned jakobbotsch and unassigned EgorBo Apr 11, 2024

jakobbotsch mentioned this issue Apr 11, 2024

JIT: Add support for bounds check no throw assertions in range check and make overflow check more precise #100777

Merged

dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Apr 11, 2024

jakobbotsch closed this as completed in #100777 Apr 17, 2024

jakobbotsch closed this as completed in 69110bf Apr 17, 2024

matouskozak pushed a commit to matouskozak/runtime that referenced this issue Apr 30, 2024

JIT: Add support for bounds check no throw assertions in range check …

cd0cd23

…and make overflow check more precise (dotnet#100777) Fix dotnet#9422 Fix dotnet#83349

github-actions bot locked and limited conversation to collaborators May 18, 2024

Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this issue May 30, 2024

JIT: Add support for bounds check no throw assertions in range check …

41dafc5

…and make overflow check more precise (dotnet#100777) Fix dotnet#9422 Fix dotnet#83349

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loop condition vs initial guard check impacts bounds checking #83349

Loop condition vs initial guard check impacts bounds checking #83349

stephentoub commented Mar 13, 2023

ghost commented Mar 13, 2023

EgorBo commented Mar 13, 2023

stephentoub commented Apr 3, 2023

EgorBo commented Aug 5, 2023

jakobbotsch commented Apr 11, 2024

Loop condition vs initial guard check impacts bounds checking #83349

Loop condition vs initial guard check impacts bounds checking #83349

Comments

stephentoub commented Mar 13, 2023

ghost commented Mar 13, 2023

EgorBo commented Mar 13, 2023

stephentoub commented Apr 3, 2023

EgorBo commented Aug 5, 2023

jakobbotsch commented Apr 11, 2024