-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize box + isinst + unbox.any
as nop
#1817
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would also be good to get some idea how often we see this in actual code -- any examples you can point at? Any hits in jit-diffs?
If there are no jit-diffs hits, please consider adding some test cases.
@AndyAyersMS I've just added tests. The jit-diff is empty but maybe because this pattern was avoided. |
I imagine this pattern may become more common now that pattern matching exists (it is relatively new after all). |
Indeed! static int Test<T>(T o)
{
return o switch
{
int n => n,
string str => str.Length,
_ => 0
};
} |
src/coreclr/tests/src/JIT/Generics/Conversions/Boxing/box_isinst_unbox-noinline.csproj
Outdated
Show resolved
Hide resolved
Wonder if I should also handle
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the test cases. These examples all get optimized and no longer allocate, right?
Should we try and verify there are no GC allocations done by these methods? See for instance JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests.cs
, though you shouldn't need to check if SPC is optimized or not.
And since this is an unconditional optimization (that is, we're optimizing these even when generating debuggable code), is it possible this might impair debuggability? I suppose C#'s debug IL ensures stack empty at each possible debug point, so probably not...
The cases where the box gets stored (or dup'd) are more complex; you'll need to be able to see all the places that local is used, and prove the boxed value can reach those uses. Tricky pull off early, without dataflow... |
Nope, some of them still allocate e.g. where Interfaces over vt and Nullable are involved
(and as you noted it can't be that easily fixed in the importer)
Should I hide the whole |
No. We want box opts to happen at Tier0, which (for now) implies no optimizations enabled. |
@AndyAyersMS is there anything left to improve in this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
This PR optimizes boxing for pattern matching idioms. Basically, it copies existing optimizations from CoreCLR to mono, see [impBoxPatternMatch](https://github.com/dotnet/runtime/blob/084dd1a5aae0fbe6ed4f9164ac4ee1c5f0b2398f/src/coreclr/src/jit/importer.cpp#L5856-L6032) (one of them was recently added by me for coreclr, see dotnet/runtime#1817) #### 1) `box + isinst + brtrue/brfalse` --> `ldc.i4.0/1 + brtrue/brfalse` ```csharp public static int Case1<T>(T t) { if (t is int) return 42; return 0; } /* IL_0000: ldarg.0 IL_0001: box !!T IL_0006: isinst [System.Private.CoreLib]System.Int32 IL_000b: brfalse.s IL_0010 IL_000d: ldc.i4.s 42 IL_000f: ret IL_0010: ldc.i4.0 IL_0011: ret */ ``` #### 2) `box + isinst + ldnull + ceq/cgt.un` --> `ldc.i4.0/1` ```csharp public static bool Case2<T>(T t) { return t is int; } /* IL_0000: ldarg.0 IL_0001: box !!T IL_0006: isinst [System.Private.CoreLib]System.Int32 IL_000b: ldnull IL_000c: cgt.un IL_000e: ret */ ``` #### 3) `box + isinst + unbox.any` --> `nop` ```csharp public static int Case3_1<T>(T o) { if (o is int x) return x; return 0; } // or public static int Case3_2<T>(T o) => o is int n ? n : 42; // or public static int Case3_3<T>(T o) { return o switch { int n => n, string str => str.Length, _ => 0 }; } ``` Tests can be found here: https://github.com/dotnet/runtime/tree/master/src/coreclr/tests/src/JIT/Generics/Conversions/Boxing [Sharplab.io link](https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8CwAoAAQAYACAZQAsIEAHAGQmADoAlAVwDsY0w4BufPgIBmEgQCMANjEAmEgGF8Ab3wk1Y0ROlouC6HHEAeACoA+ABTGSMAJSr1KvOuck0AMxLmYrqK652nFyCCAHYSABYZAUDgsKJo5wBfezUUjTEpEmAAe2yMPSg4GRMLK1s0xyC1UOsfPxgE9WSYkjSRDO1deX1hAH0jM0sSbIDnSqr3T2y6nW8AD1Gq5xq5xtiSeLTmts1M2YK4PuLBqxGSAF5TYZndDhIAfhI7kAiowRb2rXqDvuESoZGFTSyzC0ygAHc0DAAMYUYFVcZLIL7O6XJ5IeFIjKkWAIC5XXFMOhwDgAcxgFAxLSxal6+I2mJciTWJGaznwiSAA===) to observe IL code for the samples above. #### Codegen diff example: ```csharp static bool Test(int n) => Validate(n); static bool Validate<T>(T t) => t is int; ``` Before (for `Test`): ```asm 0000000000000000 pushq %r14 0000000000000002 pushq %rbx 0000000000000003 pushq %rax 0000000000000004 movl %edi, %ebx 0000000000000006 movabsq $0x7fdf9c5090c0, %rax 0000000000000010 movabsq $0x7fdfad0268e8, %r14 000000000000001a leaq 0x1be800(%r14), %rdi 0000000000000021 movl $0x14, %esi 0000000000000026 callq *(%rax) 0000000000000028 movl %ebx, 0x10(%rax) 000000000000002b movq (%rax), %rax 000000000000002e cmpq %r14, (%rax) 0000000000000031 sete %al 0000000000000034 addq $0x8, %rsp 0000000000000038 popq %rbx 0000000000000039 popq %r14 000000000000003b retq ``` After: ```asm 0000000000000000 movb $0x1, %al 0000000000000002 retq ```
This PR optimizes boxing for pattern matching idioms. Basically, it copies existing optimizations from CoreCLR to mono, see [impBoxPatternMatch](https://github.com/dotnet/runtime/blob/084dd1a5aae0fbe6ed4f9164ac4ee1c5f0b2398f/src/coreclr/src/jit/importer.cpp#L5856-L6032) (one of them was recently added by me for coreclr, see dotnet/runtime#1817) #### 1) `box + isinst + brtrue/brfalse` --> `ldc.i4.0/1 + brtrue/brfalse` ```csharp public static int Case1<T>(T t) { if (t is int) return 42; return 0; } /* IL_0000: ldarg.0 IL_0001: box !!T IL_0006: isinst [System.Private.CoreLib]System.Int32 IL_000b: brfalse.s IL_0010 IL_000d: ldc.i4.s 42 IL_000f: ret IL_0010: ldc.i4.0 IL_0011: ret */ ``` #### 2) `box + isinst + ldnull + ceq/cgt.un` --> `ldc.i4.0/1` ```csharp public static bool Case2<T>(T t) { return t is int; } /* IL_0000: ldarg.0 IL_0001: box !!T IL_0006: isinst [System.Private.CoreLib]System.Int32 IL_000b: ldnull IL_000c: cgt.un IL_000e: ret */ ``` #### 3) `box + isinst + unbox.any` --> `nop` ```csharp public static int Case3_1<T>(T o) { if (o is int x) return x; return 0; } // or public static int Case3_2<T>(T o) => o is int n ? n : 42; // or public static int Case3_3<T>(T o) { return o switch { int n => n, string str => str.Length, _ => 0 }; } ``` Tests can be found here: https://github.com/dotnet/runtime/tree/master/src/coreclr/tests/src/JIT/Generics/Conversions/Boxing [Sharplab.io link](https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8CwAoAAQAYACAZQAsIEAHAGQmADoAlAVwDsY0w4BufPgIBmEgQCMANjEAmEgGF8Ab3wk1Y0ROlouC6HHEAeACoA+ABTGSMAJSr1KvOuck0AMxLmYrqK652nFyCCAHYSABYZAUDgsKJo5wBfezUUjTEpEmAAe2yMPSg4GRMLK1s0xyC1UOsfPxgE9WSYkjSRDO1deX1hAH0jM0sSbIDnSqr3T2y6nW8AD1Gq5xq5xtiSeLTmts1M2YK4PuLBqxGSAF5TYZndDhIAfhI7kAiowRb2rXqDvuESoZGFTSyzC0ygAHc0DAAMYUYFVcZLIL7O6XJ5IeFIjKkWAIC5XXFMOhwDgAcxgFAxLSxal6+I2mJciTWJGaznwiSAA===) to observe IL code for the samples above. #### Codegen diff example: ```csharp static bool Test(int n) => Validate(n); static bool Validate<T>(T t) => t is int; ``` Before (for `Test`): ```asm 0000000000000000 pushq %r14 0000000000000002 pushq %rbx 0000000000000003 pushq %rax 0000000000000004 movl %edi, %ebx 0000000000000006 movabsq $0x7fdf9c5090c0, %rax 0000000000000010 movabsq $0x7fdfad0268e8, %r14 000000000000001a leaq 0x1be800(%r14), %rdi 0000000000000021 movl $0x14, %esi 0000000000000026 callq *(%rax) 0000000000000028 movl %ebx, 0x10(%rax) 000000000000002b movq (%rax), %rax 000000000000002e cmpq %r14, (%rax) 0000000000000031 sete %al 0000000000000034 addq $0x8, %rsp 0000000000000038 popq %rbx 0000000000000039 popq %r14 000000000000003b retq ``` After: ```asm 0000000000000000 movb $0x1, %al 0000000000000002 retq ``` Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>
Commenting here following a conversation with @tannergooding on the UWP Community Discord Server, I was wondering whether this PR would also address some codegen other inefficiencies I've noticed or whether I should open a separate issue for them. You can see that in all these 4 cases, it's possible to produce better codegen by doing checks manually. I might be missing something, but I think the JIT should be able to understand the context here and perform these optimizations automatically? Posting just images here for brevity, if you confirm I should open a separate issue to track these I'll write the various C#/asm snippets properly with markdown code blocks of course 😊 Bonus point: case 2) could also use the |
@Sergio0694 I may be better to create a new issue rather than add a comment to a merged PR. |
@Sergio0694 not sure, had a quick glance and looks like the main issue here is how JIT rotates comparisons but aside that "slow" and "fast" look the same to me |
@EgorBo Thanks for taking a look, appreciate it! 😊 There are mainly two differences I'm seeing in all of them:
Consider the first example: The "fast" version has a single branch just for the The worst offender would be the second example though, where the |
@Sergio0694 feel free to file an issue. Ideally, with code, not the screenshots ;-) |
@EgorBo Absolutely, will do! And yeah don't worry, the issue will have actual code, the screens were just to quickly put together this exploratory question to make sure I wasn't asking a dupe of something else 😄 |
This pattern may appear in pattern matching + generics, e.g.:
see IL in sharplab.io
Current codegen (for
Test<int>
):New codegen (for
Test<int>
):PS: it doesn't fix cases where interfaces/objects are used + inlining instead of generics -- these cases require more work