Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RyuJIT generates unnecessary null check #4324

Closed
mikedn opened this issue Jun 20, 2015 · 5 comments
Closed

RyuJIT generates unnecessary null check #4324

mikedn opened this issue Jun 20, 2015 · 5 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@mikedn
Copy link
Contributor

mikedn commented Jun 20, 2015

An unnecessary null check of this is generated even if it is known to be non-null due to a previous explicit check:

using System.Collections.Generic;
class Info {
    List<int> slice = new List<int>();
    public bool IsTracked(int x) {
        return slice != null && slice.Contains(x);
    }
}
class Visitor {
    Info info = new Info();
    [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.NoInlining)]
    public int Test(int x) {
        if (info.IsTracked(x))
            return 0;
        return x;
    }
}
class Program {
    static int Main(string[] args) {
        var v = new Visitor();
        return v.Test(args.Length);
    }
}

Generated code for the relevant block in the Test method:

G_M2972_IG02:
       488B4908             mov      rcx, gword ptr [rcx+8]
       488B4908             mov      rcx, gword ptr [rcx+8]
       4885C9               test     rcx, rcx                 ;explicit null check
       740B                 je       SHORT G_M2972_IG03
       8BD6                 mov      edx, esi
       3909                 cmp      dword ptr [rcx], ecx     ;redundant null check
       E85389AA5E           call     System.Collections.Generic.List`1[Int32][System.Int32]:Contains(int):bool:this
       EB02                 jmp      SHORT G_M2972_IG04

This does not happen if info is copied into a local variable and used for both the null check and the Contains call. The null check optimization behaves as if the slice field is loaded twice and in that case the null check would be required to avoid a race condition. Probably there's another part of the optimizer which later decides to (correctly) eliminate the second load but it's too late for the null check to be removed.

category:cq
theme:cse
skill-level:intermediate
cost:small
impact:small

@mburbea
Copy link

mburbea commented Jun 22, 2015

Technically this is fine as in theory, between the first null check and the call to contains the method may have to yield to another method.

@mikedn
Copy link
Contributor Author

mikedn commented Jul 13, 2015

Turns out that the load is eliminated early enough, during CSE which precedes assertion propagation. The problem is that CSE transforms the LHS of != into a GT_COMMA and assertion propagation doesn't see the local variable that's behind the comma.

@mikedn
Copy link
Contributor Author

mikedn commented Jul 16, 2015

It looks like it's not only the comma that's causing problems, the local introduced by CSE isn't in SSA so assertions for it aren't generated anyway.

@mikedn
Copy link
Contributor Author

mikedn commented Jul 17, 2015

And on top of GT_COMMA and SSA the uses introduced by CSE copy the VN pair of the original expression but they should copy the VN pair of the CSE def as threading considerations that affect conservative VNs do not apply to locals introduced by CSE.

It would appear that this is relatively easy to fix if CSE locals can't have more than one def. Multiple defs would probably complicate things a bit.

@erozenfeld erozenfeld self-assigned this May 8, 2017
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@msftgits msftgits added this to the Future milestone Jan 30, 2020
@erozenfeld erozenfeld removed their assignment Oct 12, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Jan 24, 2023
@AndyAyersMS
Copy link
Member

And on top of GT_COMMA and SSA the uses introduced by CSE copy the VN pair of the original expression but they should copy the VN pair of the CSE def as threading considerations that affect conservative VNs do not apply to locals introduced by CSE.

It would appear that this is relatively easy to fix if CSE locals can't have more than one def. Multiple defs would probably complicate things a bit.

This was fixed by dotnet/coreclr#9451.

Current codegen has eliminated the null check:

; Method Visitor:Test(int):int:this (FullOpts)
G_M31901_IG01:  ;; offset=0x0000
       push     rsi
       push     rbx
       sub      rsp, 40
       mov      ebx, edx
						;; size=8 bbWeight=1 PerfScore 2.50

G_M31901_IG02:  ;; offset=0x0008
       mov      rcx, gword ptr [rcx+0x08]
       mov      rcx, gword ptr [rcx+0x08]
       test     rcx, rcx
       je       SHORT G_M31901_IG06
						;; size=13 bbWeight=1 PerfScore 5.25

G_M31901_IG03:  ;; offset=0x0015
       mov      r9d, dword ptr [rcx+0x10]
       test     r9d, r9d
       jne      SHORT G_M31901_IG04
       xor      esi, esi
       jmp      SHORT G_M31901_IG05
						;; size=13 bbWeight=0.50 PerfScore 2.75

G_M31901_IG04:  ;; offset=0x0022
       mov      rcx, gword ptr [rcx+0x08]
       mov      edx, ebx
       xor      r8d, r8d
       call     [System.Array:IndexOf[int](int[],int,int,int):int]
       mov      esi, eax
       not      esi
       shr      esi, 31
						;; size=22 bbWeight=0.50 PerfScore 3.25

G_M31901_IG05:  ;; offset=0x0038
       test     esi, esi
       jne      SHORT G_M31901_IG08
						;; size=4 bbWeight=0.50 PerfScore 0.62

G_M31901_IG06:  ;; offset=0x003C
       mov      eax, ebx
						;; size=2 bbWeight=0.50 PerfScore 0.12

G_M31901_IG07:  ;; offset=0x003E
       add      rsp, 40
       pop      rbx
       pop      rsi
       ret      
						;; size=7 bbWeight=0.50 PerfScore 1.12

G_M31901_IG08:  ;; offset=0x0045
       xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12

G_M31901_IG09:  ;; offset=0x0047
       add      rsp, 40
       pop      rbx
       pop      rsi
       ret      
						;; size=7 bbWeight=0.50 PerfScore 1.12
; Total bytes of code: 78

There's also a partially redundant branch in IG05 which RBO doesn't see because when RBO runs the join/split is factored across two blocks, and the value tested is a copy of the phi at the join:

image - 2024-06-01T103337 835

Handling this would require a few generalizations of jump threading:

  • if block is a fork and has a linear chain of preds, search up the chain for a join
  • if value being tested is a copy, track the copy and ensure all copy dests don't outlive the fork
  • if a join is found check for phi for the active copy

The split up flow is created during the post morph flow opts, so perhaps that could be enhanced. If so I think RBO would be able to handle this case.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants