Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unoptimal codegen for "obj is T" with T being struct/sealed #36649

Closed
Sergio0694 opened this issue May 18, 2020 · 5 comments · Fixed by #103391
Closed

Unoptimal codegen for "obj is T" with T being struct/sealed #36649

Sergio0694 opened this issue May 18, 2020 · 5 comments · Fixed by #103391
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged optimization tenet-performance Performance related issue
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented May 18, 2020

Follow up from a question in #1817 (here), cc. @EgorBo.

Description

I think I've identified 4 scenarios where the JIT doesn't produce optimal codegen for an object is T or object is T variable expression, when T is either a struct or a `sealed class.

object is T, when T is a struct (click to expand)
public static bool Is_Slow<T>(object obj) where T : struct
{
    return obj is T;
}
; using T = int
C.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0016
    L0005: mov rax, 0x7ff9d6bdb1e8
    L000f: cmp [rcx], rax
    L0012: je short L0016
    L0014: xor ecx, ecx
    L0016: test rcx, rcx
    L0019: setne al
    L001c: movzx eax, al
    L001f: ret

Note how the JIT creates two separate branches, one per condition (null check and type check). This can be improved by just rewriting the code manually to perform those two checks individually:

public static bool Is_Fast<T>(object obj) where T : struct
{
    return obj != null && obj.GetType() == typeof(T);
}
C.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0019
    L0005: mov rax, 0x7ff9d6bdb1e8
    L000f: cmp [rcx], rax
    L0012: sete al
    L0015: movzx eax, al
    L0018: ret
    L0019: xor eax, eax
    L001b: ret

Here the type check is just done with a cmp + setz, removing one conditional branch entirely.

object is T value, when T is a struct (click to expand)
public static T UnboxOrDefault_Slow<T>(object obj) where T : struct
{
    return (obj is T value) ? value : default;
}
C.UnboxOrDefault_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: mov rdx, rsi
    L000b: test rdx, rdx
    L000e: je short L0021
    L0010: mov rcx, 0x7ff9d6bdb1e8
    L001a: cmp [rdx], rcx
    L001d: je short L0021
    L001f: xor edx, edx
    L0021: test rdx, rdx
    L0024: je short L0050
    L0026: mov rdx, 0x7ff9d6bdb1e8
    L0030: cmp [rsi], rdx
    L0033: je short L0047
    L0035: mov rdx, rsi
    L0038: mov rcx, 0x7ff9d6bdb1e8
    L0042: call 0x00007ffa366e04a0
    L0047: mov eax, [rsi+8]
    L004a: add rsp, 0x20
    L004e: pop rsi
    L004f: ret
    L0050: xor eax, eax
    L0052: add rsp, 0x20
    L0056: pop rsi
    L0057: ret

In this case the JIT creates 4 branches, two for the is check and 2 for the unbox.any opcode, as the runtime unfortunately still doesn't support/emit the no. prefix. Anyway, here's with explicit code:

public static T UnboxOrDefault_Fast<T>(object obj) where T : struct
{
    return obj != null && obj.GetType() == typeof(T) ? (T)obj : default;
}
C.UnboxOrDefault_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov rsi, rcx
    L0008: test rsi, rsi
    L000b: je short L001c
    L000d: mov rax, 0x7ff9d6bdb1e8
    L0017: cmp [rsi], rax
    L001a: je short L0024
    L001c: xor eax, eax
    L001e: add rsp, 0x20
    L0022: pop rsi
    L0023: ret
    L0024: mov rdx, 0x7ff9d6bdb1e8
    L002e: cmp [rsi], rdx
    L0031: je short L0045
    L0033: mov rdx, rsi
    L0036: mov rcx, 0x7ff9d6bdb1e8
    L0040: call 0x00007ffa366e04a0
    L0045: mov eax, [rsi+8]
    L0048: add rsp, 0x20
    L004c: pop rsi
    L004d: ret

As with the previous case, one less conditional branch and slightly smaller codegen.

object is T, when T is a sealed class (click to expand)
public sealed class Model
{
    public static bool Is_Slow<T>(object obj)
    {
        return obj is Model;
    }
}
Model.Is_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0016
    L0005: mov rax, 0x7ff9ded6cdf0
    L000f: cmp [rcx], rax
    L0012: je short L0016
    L0014: xor ecx, ecx
    L0016: test rcx, rcx
    L0019: setne al
    L001c: movzx eax, al
    L001f: ret

And here is with the manual checks just like the first two cases:

public sealed class Model
{
    public static bool Is_Fast<T>(object obj)
    {
        return obj != null && obj.GetType() == typeof(Model);
    }
}
Model.Is_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0019
    L0005: mov rax, 0x7ff9ded6cdf0
    L000f: cmp [rcx], rax
    L0012: sete al
    L0015: movzx eax, al
    L0018: ret
    L0019: xor eax, eax
    L001b: ret
object is T value, when T is a sealed class (click to expand)
public sealed class Model
{
    public static Model GetOrNull_Slow<T>(object obj)
    {
        if (obj is Model model) return model;
        return null;
    }
}
Model.GetOrNull_Slow[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: mov rax, rcx
    L0003: test rax, rax
    L0006: je short L0019
    L0008: mov rdx, 0x7ff9df11ce10
    L0012: cmp [rax], rdx
    L0015: je short L0019
    L0017: xor eax, eax
    L0019: test rax, rax
    L001c: je short L001f
    L001e: ret
    L001f: xor eax, eax
    L0021: ret

As above, one redundant conditional branch. Here is with explicit checks, note I'm using Unsafe.As<T>(object) here to force the JIT not to emit additional checks, as a standard (T) cast would result in worse codegen.

public sealed class Model
{
    public static Model GetOrNull_Fast<T>(object obj)
    {
        if (obj != null && obj.GetType() == typeof(Model))
        {
            return Unsafe.As<Model>(obj);
        }
        return null;
    }
}
Model.GetOrNull_Fast[[System.Int32, System.Private.CoreLib]](System.Object)
    L0000: test rcx, rcx
    L0003: je short L0018
    L0005: mov rax, 0x7ff9df11ce10
    L000f: cmp [rcx], rax
    L0012: jne short L0018
    L0014: mov rax, rcx
    L0017: ret
    L0018: xor eax, eax
    L001a: ret

Here we once again have one less conditional branch than the one produced by the is operator.

Note: in this last case, we could rewrite the first method as simply returning as Model, which correctly optimizes the final codegen and results in even smaller code size. I figured it was still worth pointing out the missed optimization when writing the code through the is operator though, as devs might very well still use it for a variety of reasons.

There are mainly two potential improvements I'm seeing:

  • One less conditional branch in the "fast" version
  • Slightly smaller codegen (this might in part go away if the method is inlined though)

Configuration

Tested on sharplab.io, in Default, x64 and Roslyn master branches.
All assembly is from the Release configuration.

category:cq
theme:optimization
skill-level:expert
cost:medium
impact:small

@Sergio0694 Sergio0694 added the tenet-performance Performance related issue label May 18, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels May 18, 2020
@EgorBo
Copy link
Member

EgorBo commented May 18, 2020

The issue can be simplified to:

public static bool Is_Slow(object obj) =>
    obj is int;

public static bool Is_Fast(object obj) =>
    obj != null && obj.GetType() == typeof(int);

IR for Is_Slow (before Rationalize IR)

Trees before Rationalize IR

--------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
--------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..???)-> BB04 ( cond )                     i label target 
BB02 [0003]  1       BB01                  0.25     [???..???)-> BB04 ( cond )                     i 
BB03 [0002]  1       BB02                  0.12     [???..???)                                     i 
BB04 [0001]  3       BB03,BB02,BB01        1        [???..00A)        (return)                     i label target 
--------------------------------------------------------------------------------------------------------------

------------ BB01 [000..???) -> BB04 (cond), preds={} succs={BB02,BB04}

***** BB01
STMT00004 (IL 0x000...  ???)
N003 (  1,  3) [000024] -A------R---              *  ASG       ref    $80
N002 (  1,  1) [000023] D------N----              +--*  LCL_VAR   ref    V02 tmp1         d:1 $80
N001 (  1,  1) [000012] ------?-----              \--*  LCL_VAR   ref    V00 arg0         u:1 $80

***** BB01
STMT00002 (IL 0x000...  ???)
N004 (  5,  5) [000021] ------------              *  JTRUE     void  
N003 (  3,  3) [000007] J------N----              \--*  EQ        int    $100
N001 (  1,  1) [000006] ------------                 +--*  LCL_VAR   ref    V02 tmp1         u:1 $80
N002 (  1,  1) [000005] ------------                 \--*  CNS_INT   ref    null $VN.Null

------------ BB02 [???..???) -> BB04 (cond), preds={BB01} succs={BB03,BB04}

***** BB02
STMT00003 (IL 0x000...  ???)
N005 (  9, 15) [000022] -----O------              *  JTRUE     void  
N004 (  7, 13) [000004] J----O?N----              \--*  EQ        int    $102
N002 (  3,  2) [000003] n----O?-----                 +--*  IND       long   $1c0
N001 (  1,  1) [000002] ------?-----                 |  \--*  LCL_VAR   ref    V02 tmp1         u:1 (last use) $80
N003 (  3, 10) [000001] ------?-----                 \--*  CNS_INT(h) long   0x7ffc2247dea0 class $200

------------ BB03 [???..???), preds={BB02} succs={BB04}

***** BB03
STMT00005 (IL 0x000...  ???)
N003 (  1,  3) [000026] -A------R---              *  ASG       ref    $VN.Null
N002 (  1,  1) [000025] D------N----              +--*  LCL_VAR   ref    V02 tmp1         d:3 $VN.Null
N001 (  1,  1) [000009] ------?-----              \--*  CNS_INT   ref    null $VN.Null

------------ BB04 [???..00A) (return), preds={BB03,BB02,BB01} succs={}

***** BB04
STMT00006 (IL   ???...  ???)
N005 (  0,  0) [000029] -A------R---              *  ASG       ref   
N004 (  0,  0) [000027] D------N----              +--*  LCL_VAR   ref    V02 tmp1         d:2
N003 (  0,  0) [000028] ------------              \--*  PHI       ref   
N001 (  0,  0) [000031] ------------ pred BB03       +--*  PHI_ARG   ref    V02 tmp1         u:3 $VN.Null
N002 (  0,  0) [000030] ------------ pred BB01       \--*  PHI_ARG   ref    V02 tmp1         u:1 $80

***** BB04
STMT00001 (IL   ???...  ???)
N004 (  7,  4) [000020] ------------              *  RETURN    int    $280
N003 (  6,  3) [000019] N-----------              \--*  NE        int    $103
N001 (  1,  1) [000017] ------------                 +--*  LCL_VAR   ref    V02 tmp1         u:2 (last use) $240
N002 (  1,  1) [000018] ------------                 \--*  CNS_INT   ref    null $VN.Null

IR for Is_Fast (before Rationalize IR)

Trees before Rationalize IR

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1        [000..003)-> BB03 ( cond )                     i label target 
BB02 [0001]  1       BB01                  0.50     [003..019)        (return)                     i 
BB03 [0002]  1       BB01                  0.50     [019..01B)        (return)                     i label target 
-----------------------------------------------------------------------------------------------------------------------------------------

------------ BB01 [000..003) -> BB03 (cond), preds={} succs={BB02,BB03}

***** BB01
STMT00000 (IL 0x000...0x001)
N004 (  5,  5) [000003] ------------              *  JTRUE     void  
N003 (  3,  3) [000002] J------N----              \--*  EQ        int    $c0
N001 (  1,  1) [000000] ------------                 +--*  LCL_VAR   ref    V00 arg0         u:1 $80
N002 (  1,  1) [000001] ------------                 \--*  CNS_INT   ref    null $VN.Null

------------ BB02 [003..019) (return), preds={BB01} succs={}

***** BB02
STMT00002 (IL 0x003...0x018)
N005 ( 11, 14) [000014] -----O------              *  RETURN    int    $101
N004 ( 10, 13) [000013] -----O------              \--*  EQ        int    $c2
N002 (  3,  2) [000012] n----O------                 +--*  IND       long   $1c0
N001 (  1,  1) [000006] ------------                 |  \--*  LCL_VAR   ref    V00 arg0         u:1 (last use) $80
N003 (  3, 10) [000008] ------------                 \--*  CNS_INT(h) long   0x7ffc224adea0 class $200

------------ BB03 [019..01B) (return), preds={BB01} succs={}

***** BB03
STMT00001 (IL 0x019...0x01A)
N002 (  2,  2) [000005] ------------              *  RETURN    int    $100
N001 (  1,  1) [000004] ------------              \--*  CNS_INT   int    0 $40

I guess the problem here is how isinst is imported - when JIT imports it it doesn't know isinst is only used for true/false and we don't care about the instance isinst returns.

@AndyAyersMS
Copy link
Member

Yeah, I think the fix is upstream, either in the initial QMARK formation or in morph's expansion.

In a case like this we might also run into issues as we settle on the optimal number of returns early.

@BruceForstall BruceForstall added this to the Future milestone May 18, 2020
@BruceForstall BruceForstall added optimization and removed untriaged New issue has not been triaged by the area owner labels May 18, 2020
@AndyAyersMS
Copy link
Member

Not there is already code in the jit that tries to optimize this sort of flow pattern, but it looks like there are some odd assumptions / missing cases.

  • fgOptimizeUncondBranchToSimpleCond will tail duplicate (which is the opt we need here) but it requires a BBJ_ALWAYS pred for the join block; here we only have BBJ_COND / BBJ_NEXT.
  • fgExpandQmarkForCastInstOf handles the qmark expansion for this case

As noted above, tail duping a return block may be problematic in some cases. We are sometimes constrained on number of return blocks, and we decide fairly early how many returns we should have. Seems like this decision could possibly be deferred until after optimization.

Another approach might be to allow QMARK to remain unspilled in the importer so that we don't expand its flow until we understand how the result is consumed. We could blow these out in a post importer phase where it's easier to introduce flow (say in the "indirect call transform phase" which we could just acknowledge is now a general flow expansion phase) -- not sure if this is sufficient to rid ourselves of all QMARKs but it would be nice if it did.

Am going to play around first with generalizing the tail dup code first.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue May 27, 2020
1. Allow duplicating when predecessor is BBJ_NONE
2. Require that predecessor and successor reference the same local
3. Make sure that local is not address exposed
4. Check up to two statements in predecessor for local reference
5. Require successor to compare local to constant, or local to local

Changes inspired by dotnet#36649 but don't actually improve CQ for that case (yet).
AndyAyersMS added a commit that referenced this issue May 28, 2020
1. Allow duplicating when predecessor is BBJ_NONE
2. Require that predecessor and successor reference the same local
3. Make sure that local is not address exposed
4. Check up to two statements in predecessor for local reference
5. Require successor to compare local to constant, or local to local

Changes inspired by #36649 but don't actually improve CQ for that case (yet).

Also, add morph to post-phase whitelist because it seems odd not
to dump the IR after morph.
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 25, 2020
@Sergio0694
Copy link
Contributor Author

Sergio0694 commented Jun 8, 2022

Stumbled upon a new example of this (sharplab link):

[JitGeneric(typeof(int))]
public class Test<T>
    where T : unmanaged
{
    public static bool TryGetCommandArgument(object obj, out T result)
    {        
        if (obj is T argument)
        {
            result = argument;
            
            return true;
        }
        
        result = default;
            
        return false;
    }
    
    public static bool TryGetCommandArgument_Manual(object obj, out T result)
    {
        if (obj is not null && obj.GetType() == typeof(T))
        {
            // Note: I'm aware this is UB, it's just to illustrate the codegen.
            // The JIT should conceptually do the same anyway.
            result = Unsafe.As<StrongBox<T>>(obj).Value;
            
            return true;
        }
        
        result = default;
            
        return false;
    }
}
Codegen for TryGetCommandArgument (click to expand)
Test`1[[System.Int32, System.Private.CoreLib]].TryGetCommandArgument(System.Object, Int32 ByRef)
    L0000: push rdi
    L0001: push rsi
    L0002: sub rsp, 0x28
    L0006: mov rsi, rcx
    L0009: mov rdi, rdx
    L000c: mov rdx, rsi
    L000f: test rdx, rdx
    L0012: je short L0058
    L0014: mov rdx, [rdx]
    L0017: mov rcx, 0x7ffd9ab69480
    L0021: cmp rdx, rcx
    L0024: jne short L0058
    L0026: mov rcx, 0x7ffd9ab69480
    L0030: cmp rdx, rcx
    L0033: je short L0047
    L0035: mov rdx, rsi
    L0038: mov rcx, 0x7ffd9ab69480
    L0042: call 0x00007ffd9aad7478
    L0047: mov eax, [rsi+8]
    L004a: mov [rdi], eax
    L004c: mov eax, 1
    L0051: add rsp, 0x28
    L0055: pop rsi
    L0056: pop rdi
    L0057: ret
    L0058: xor eax, eax
    L005a: mov [rdi], eax
    L005c: add rsp, 0x28
    L0060: pop rsi
    L0061: pop rdi
    L0062: ret
Codegen for TryGetCommandArgument_Manual (click to expand)
Test`1[[System.Int32, System.Private.CoreLib]].TryGetCommandArgument_Manual(System.Object, Int32 ByRef)
    L0000: test rcx, rcx
    L0003: je short L001f
    L0005: mov rax, 0x7ffd9ab69480
    L000f: cmp [rcx], rax
    L0012: jne short L001f
    L0014: mov eax, [rcx+8]
    L0017: mov [rdx], eax
    L0019: mov eax, 1
    L001e: ret
    L001f: xor eax, eax
    L0021: mov [rdx], eax
    L0023: ret

@jakobbotsch
Copy link
Member

#103391 fixes what I would consider the important part of this (multiple convoluted checks done on top of the previous checks), but it does not turn the cmp, jne, mov eax 1 into the cmp, setz, movzx. Seems like if-conversion does not get the case for some reason.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jun 13, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI in-pr There is an active PR which will close this issue when it is merged optimization tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants