-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unoptimal codegen for "obj is T" with T being struct/sealed #36649
Comments
The issue can be simplified to: public static bool Is_Slow(object obj) =>
obj is int;
public static bool Is_Fast(object obj) =>
obj != null && obj.GetType() == typeof(int); IR for
IR for
I guess the problem here is how |
Yeah, I think the fix is upstream, either in the initial In a case like this we might also run into issues as we settle on the optimal number of returns early. |
Not there is already code in the jit that tries to optimize this sort of flow pattern, but it looks like there are some odd assumptions / missing cases.
As noted above, tail duping a return block may be problematic in some cases. We are sometimes constrained on number of return blocks, and we decide fairly early how many returns we should have. Seems like this decision could possibly be deferred until after optimization. Another approach might be to allow QMARK to remain unspilled in the importer so that we don't expand its flow until we understand how the result is consumed. We could blow these out in a post importer phase where it's easier to introduce flow (say in the "indirect call transform phase" which we could just acknowledge is now a general flow expansion phase) -- not sure if this is sufficient to rid ourselves of all QMARKs but it would be nice if it did. Am going to play around first with generalizing the tail dup code first. |
1. Allow duplicating when predecessor is BBJ_NONE 2. Require that predecessor and successor reference the same local 3. Make sure that local is not address exposed 4. Check up to two statements in predecessor for local reference 5. Require successor to compare local to constant, or local to local Changes inspired by dotnet#36649 but don't actually improve CQ for that case (yet).
1. Allow duplicating when predecessor is BBJ_NONE 2. Require that predecessor and successor reference the same local 3. Make sure that local is not address exposed 4. Check up to two statements in predecessor for local reference 5. Require successor to compare local to constant, or local to local Changes inspired by #36649 but don't actually improve CQ for that case (yet). Also, add morph to post-phase whitelist because it seems odd not to dump the IR after morph.
Stumbled upon a new example of this (sharplab link): [JitGeneric(typeof(int))]
public class Test<T>
where T : unmanaged
{
public static bool TryGetCommandArgument(object obj, out T result)
{
if (obj is T argument)
{
result = argument;
return true;
}
result = default;
return false;
}
public static bool TryGetCommandArgument_Manual(object obj, out T result)
{
if (obj is not null && obj.GetType() == typeof(T))
{
// Note: I'm aware this is UB, it's just to illustrate the codegen.
// The JIT should conceptually do the same anyway.
result = Unsafe.As<StrongBox<T>>(obj).Value;
return true;
}
result = default;
return false;
}
} Codegen for TryGetCommandArgument (click to expand)Test`1[[System.Int32, System.Private.CoreLib]].TryGetCommandArgument(System.Object, Int32 ByRef)
L0000: push rdi
L0001: push rsi
L0002: sub rsp, 0x28
L0006: mov rsi, rcx
L0009: mov rdi, rdx
L000c: mov rdx, rsi
L000f: test rdx, rdx
L0012: je short L0058
L0014: mov rdx, [rdx]
L0017: mov rcx, 0x7ffd9ab69480
L0021: cmp rdx, rcx
L0024: jne short L0058
L0026: mov rcx, 0x7ffd9ab69480
L0030: cmp rdx, rcx
L0033: je short L0047
L0035: mov rdx, rsi
L0038: mov rcx, 0x7ffd9ab69480
L0042: call 0x00007ffd9aad7478
L0047: mov eax, [rsi+8]
L004a: mov [rdi], eax
L004c: mov eax, 1
L0051: add rsp, 0x28
L0055: pop rsi
L0056: pop rdi
L0057: ret
L0058: xor eax, eax
L005a: mov [rdi], eax
L005c: add rsp, 0x28
L0060: pop rsi
L0061: pop rdi
L0062: ret Codegen for TryGetCommandArgument_Manual (click to expand)Test`1[[System.Int32, System.Private.CoreLib]].TryGetCommandArgument_Manual(System.Object, Int32 ByRef)
L0000: test rcx, rcx
L0003: je short L001f
L0005: mov rax, 0x7ffd9ab69480
L000f: cmp [rcx], rax
L0012: jne short L001f
L0014: mov eax, [rcx+8]
L0017: mov [rdx], eax
L0019: mov eax, 1
L001e: ret
L001f: xor eax, eax
L0021: mov [rdx], eax
L0023: ret |
#103391 fixes what I would consider the important part of this (multiple convoluted checks done on top of the previous checks), but it does not turn the |
Follow up from a question in #1817 (here), cc. @EgorBo.
Description
I think I've identified 4 scenarios where the JIT doesn't produce optimal codegen for an
object is T
orobject is T variable
expression, whenT
is either astruct
or a `sealed class.object is T, when T is a struct (click to expand)
Note how the JIT creates two separate branches, one per condition (
null
check and type check). This can be improved by just rewriting the code manually to perform those two checks individually:Here the type check is just done with a
cmp
+setz
, removing one conditional branch entirely.object is T value, when T is a struct (click to expand)
In this case the JIT creates 4 branches, two for the
is
check and 2 for theunbox.any
opcode, as the runtime unfortunately still doesn't support/emit theno.
prefix. Anyway, here's with explicit code:As with the previous case, one less conditional branch and slightly smaller codegen.
object is T, when T is a sealed class (click to expand)
And here is with the manual checks just like the first two cases:
object is T value, when T is a sealed class (click to expand)
As above, one redundant conditional branch. Here is with explicit checks, note I'm using
Unsafe.As<T>(object)
here to force the JIT not to emit additional checks, as a standard(T)
cast would result in worse codegen.Here we once again have one less conditional branch than the one produced by the
is
operator.There are mainly two potential improvements I'm seeing:
Configuration
Tested on sharplab.io, in
Default
,x64
and Roslynmaster
branches.All assembly is from the Release configuration.
category:cq
theme:optimization
skill-level:expert
cost:medium
impact:small
The text was updated successfully, but these errors were encountered: