Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upper bound effect on unsafe indexing #61498

Closed
Djoums opened this issue Nov 12, 2021 · 10 comments
Closed

Upper bound effect on unsafe indexing #61498

Djoums opened this issue Nov 12, 2021 · 10 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@Djoums
Copy link

Djoums commented Nov 12, 2021

Using .Net6 on Windows 10, VS Code.

Not sure if this is an issue or not, but I don't understand what's going on.

Consider the following code :

private const int Count = 1_000_000;
private readonly long[] Values = new long[Count];

public long Test1()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Count))
        result ^= lg;
    return result;
}

public long Test2()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Values.Length))
        result ^= lg;
    return result;
}

[StructLayout(LayoutKind.Auto)]
public struct ValueEnumerator<T>
{
    private readonly T[] Entries;
    private readonly int Count;
    private int CurrentIndex;

    public ValueEnumerator(T[] entries, int count) => (Entries, CurrentIndex, Count) = (entries, -1, count);

    public readonly ValueEnumerator<T> GetEnumerator() => this;
    public readonly T Current => Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Entries), CurrentIndex);
    public bool MoveNext() => ++CurrentIndex < Count;
}

The only difference between Test1 and Test2 is the upper bound, either Count or Values.Length.

Given that the enumerator is using an unsafe indexer, I would assume that this would have no effect on performance whatsoever. But this is not the case. Running this code through Benchmark.Net gives me the following result :

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Test1 594.2 us 3.12 us 2.92 us - - - 1 B
Test2 498.5 us 4.54 us 4.25 us - - - -

Looking at sharplab.io, the asm is indeed different between the two functions, but I'm not fluent enough in it to understand what's going on.

@Djoums Djoums added the tenet-performance Performance related issue label Nov 12, 2021
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 12, 2021
@GrabYourPitchforks GrabYourPitchforks added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 12, 2021
@ghost
Copy link

ghost commented Nov 12, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Using .Net6 on Windows 10, VS Code.

Not sure if this is an issue or not, but I don't understand what's going on.

Consider the following code :

private const int Count = 1_000_000;
private readonly long[] Values = new long[Count];

public long Test1()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Count))
        result ^= lg;
    return result;
}

public long Test2()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Values.Length))
        result ^= lg;
    return result;
}

[StructLayout(LayoutKind.Auto)]
public struct ValueEnumerator<T>
{
    private readonly T[] Entries;
    private readonly int Count;
    private int CurrentIndex;

    public ValueEnumerator(T[] entries, int count) => (Entries, CurrentIndex, Count) = (entries, -1, count);

    public readonly ValueEnumerator<T> GetEnumerator() => this;
    public readonly T Current => Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(Entries), CurrentIndex);
    public bool MoveNext() => ++CurrentIndex < Count;
}

The only difference between Test1 and Test2 is the upper bound, either Count or Values.Length.

Given that the enumerator is using an unsafe indexer, I would assume that this would have no effect on performance whatsoever. But this is not the case. Running this code through Benchmark.Net gives me the following result :

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Test1 594.2 us 3.12 us 2.92 us - - - 1 B
Test2 498.5 us 4.54 us 4.25 us - - - -

Looking at sharplab.io, the asm is indeed different between the two functions, but I'm not fluent enough in it to understand what's going on.

Author: Djoums
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Nov 12, 2021
@JulieLeeMSFT
Copy link
Member

@jakobbotsch PTAL.
CC @dotnet/jit-contrib

@jakobbotsch
Copy link
Member

Looks like we manage to eliminate the null-check of Values in Test2 (the access of Values.Length forces it to happen outside of the loop, which probably helps). In Test1 we end up null checking Values on every iteration.
Amusingly that means the following code should give better performance:

public long Test1()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Count + Values.Length * 0))
        result ^= lg;
    return result;
}

Seems related to loop opts, cc @BruceForstall.

@BruceForstall
Copy link
Member

We do need the null check, since Test1 doesn't get it "for free" with the .Length check. But the null check itself is loop invariant.

It's interesting that loop hoisting does hoist the address calculation that the null check protects:

N006 (  5,  5) [000154] ---XG--N----              \--*  COMMA     byref  <l:$184, c:$183>
N002 (  2,  2) [000150] ---X---N----                 +--*  NULLCHECK byte   <l:$208, c:$207>
N001 (  1,  1) [000149] ------------                 |  \--*  LCL_VAR   ref    V14 tmp9         u:2 <l:$201, c:$82>
N005 (  3,  3) [000153] ------------                 \--*  ADD       byref  <l:$182, c:$181>
N003 (  1,  1) [000151] ------------                    +--*  LCL_VAR   ref    V14 tmp9         u:2 <l:$201, c:$82>
N004 (  1,  1) [000152] ------------                    \--*  CNS_INT   long   16 field offset Fseq[Data] $c2

(that is, [000153]), but says this about [000150]:

 [000150] not invariant: not handled by cse

@Djoums
Copy link
Author

Djoums commented Nov 13, 2021

Thanks for the answers.

The .Length trick does work, but not with a straight * 0 (I imagine the compiler optimizes it away). This works however, and is a net performance gain :

public long Test1()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Count + Values.Length * 2 - Values.Length - Values.Length))
        result ^= lg;
    return result;
}

I understand ValueEnumerator<T>.Current requires a null check in general, but in this case Values cannot be null and is readonly, is there a way to take this into account ? (also the NRT option is on but not sure it has any effect on the JIT compiler).

Also, since Entries is a local readonly copy within the enumerator, is it possible to null check only once at the first iteration ?

@jakobbotsch
Copy link
Member

Thanks for the answers.

The .Length trick does work, but not with a straight * 0 (I imagine the compiler optimizes it away). This works however, and is a net performance gain :

public long Test1()
{
    long result = 0;
    foreach (long lg in new ValueEnumerator<long>(Values, Count + Values.Length * 2 - Values.Length - Values.Length))
        result ^= lg;
    return result;
}

The compiler optimizes Values.Length * 0 away yes, but it cannot remove the null check, so it has the same effect of removing the null check on every iteration. The difference between your version and the version I sent previously ends up being that in your version the JIT hoists out the bound and it stays in a register, while otherwise it ends up directly in the cmp instruction:

G_M43963_IG03:              ;; offset=0028H
       4C8BC9               mov      r9, rcx
       4C63D2               movsxd   r10, edx
       4F8B0CD1             mov      r9, qword ptr [r9+8*r10]
       4933C1               xor      rax, r9
       FFC2                 inc      edx
-       413BD0               cmp      edx, r8d
+       81FA40420F00         cmp      edx, 0xF4240
       7CEC                 jl       SHORT G_M43963_IG03

It's interesting that this also affects the performance.

I understand ValueEnumerator<T>.Current requires a null check in general, but in this case Values cannot be null and is readonly, is there a way to take this into account ? (also the NRT option is on but not sure it has any effect on the JIT compiler).

It is possible to set such fields using reflection and in general we do not really have the capability to reason "non-locally" about stuff like this. It would require looking at the constructor of the class and determining that no ref to Values escapes and that it is always initialized to non-null -- in the face of things like EnC or profiler rewriting this kind of analysis is also hard to rely on.

Also, since Entries is a local readonly copy within the enumerator, is it possible to null check only once at the first iteration ?

Right, that's what Bruce was referring to above with the fact that it is loop invariant -- we'd like to hoist the null check out of the loop body always. It's also possible we should be more aggressive about hoisting constants out of loops when register pressure is low.

@Djoums
Copy link
Author

Djoums commented Nov 14, 2021

Alright, thanks again for the explanations.
I don't know if this issue should be closed then, so I'll let you handle it 😉

@BruceForstall BruceForstall added this to the Future milestone Nov 15, 2021
@BruceForstall BruceForstall removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 15, 2021
@BruceForstall
Copy link
Member

It looks related to #61420, but we can keep it open for when someone works on hoisting.

@jakobbotsch
Copy link
Member

On main we manage to hoist the null check in both cases now. This was probably fixed by #68588.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

5 participants