Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize box + isinst + unbox.any as nop #1817

Merged
merged 10 commits into from
Feb 4, 2020

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jan 16, 2020

This pattern may appear in pattern matching + generics, e.g.:

int Test1<T>(T o)
{
    if (o is int x)
        return x;
    return 0;
}


// or


int Test2<T>(T o) => o is int n ? n : 42;


// or


int Test3<T>(T o)
{
    return o switch
    {
        int n => n,
        string str => str.Length,
        _ => 0
    };
}

see IL in sharplab.io

Current codegen (for Test<int>):

    L0000: push rsi
    L0001: sub rsp, 0x20
    L0005: mov esi, edx
    L0007: mov rcx, 0x7ffec352b1e8
    L0011: call 0x7fff23084690
    L0016: mov [rax+0x8], esi
    L0019: mov eax, [rax+0x8]
    L001c: add rsp, 0x20
    L0020: pop rsi
    L0021: ret

New codegen (for Test<int>):

    L0000: mov eax, edx
    L0002: ret

PS: it doesn't fix cases where interfaces/objects are used + inlining instead of generics -- these cases require more work

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 16, 2020
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be good to get some idea how often we see this in actual code -- any examples you can point at? Any hits in jit-diffs?

If there are no jit-diffs hits, please consider adding some test cases.

src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
@EgorBo
Copy link
Member Author

EgorBo commented Jan 16, 2020

@AndyAyersMS I've just added tests. The jit-diff is empty but maybe because this pattern was avoided.
I was implementing existing optimizations from impBoxPatternMatch in Mono runtime and noticed this case and decided to optimize since the fix takes just a several lines of code.

@tannergooding
Copy link
Member

I imagine this pattern may become more common now that pattern matching exists (it is relatively new after all).

@EgorBo
Copy link
Member Author

EgorBo commented Jan 16, 2020

I imagine this pattern may become more common now that pattern matching exists (it is relatively new after all).

Indeed!
E.g.:

    static int Test<T>(T o)
    {
        return o switch
        {
            int n => n,
            string str => str.Length,
            _ => 0
        };
    }

@EgorBo
Copy link
Member Author

EgorBo commented Jan 16, 2020

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test cases. These examples all get optimized and no longer allocate, right?

Should we try and verify there are no GC allocations done by these methods? See for instance JIT\opt\ObjectStackAllocation\ObjectStackAllocationTests.cs, though you shouldn't need to check if SPC is optimized or not.

And since this is an unconditional optimization (that is, we're optimizing these even when generating debuggable code), is it possible this might impair debuggability? I suppose C#'s debug IL ensures stack empty at each possible debug point, so probably not...

src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
@AndyAyersMS
Copy link
Member

The cases where the box gets stored (or dup'd) are more complex; you'll need to be able to see all the places that local is used, and prove the boxed value can reach those uses. Tricky pull off early, without dataflow...

@EgorBo
Copy link
Member Author

EgorBo commented Jan 18, 2020

Thanks for the test cases. These examples all get optimized and no longer allocate, right?

Nope, some of them still allocate e.g. where Interfaces over vt and Nullable are involved
mostly due to

    box !!T
    isinst [...]
    stloc.0
    ldloc.0
    brtrue/brfalse

(and as you noted it can't be that easily fixed in the importer)

And since this is an unconditional optimization (that is, we're optimizing these even when generating debuggable code), is it possible this might impair debuggability? I suppose C#'s debug IL ensures stack empty at each possible debug point, so probably not...

Should I hide the whole impBoxPatternMatch under OptimizationsEnabled condition?

@AndyAyersMS
Copy link
Member

Should I hide the whole impBoxPatternMatch under OptimizationsEnabled condition?

No. We want box opts to happen at Tier0, which (for now) implies no optimizations enabled.

@EgorBo
Copy link
Member Author

EgorBo commented Feb 4, 2020

@AndyAyersMS is there anything left to improve in this PR?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

monojenkins pushed a commit to monojenkins/mono that referenced this pull request Mar 4, 2020
This PR optimizes boxing for pattern matching idioms. Basically, it copies existing optimizations from CoreCLR to mono, see [impBoxPatternMatch](https://github.com/dotnet/runtime/blob/084dd1a5aae0fbe6ed4f9164ac4ee1c5f0b2398f/src/coreclr/src/jit/importer.cpp#L5856-L6032) (one of them was recently added by me for coreclr, see dotnet/runtime#1817)
#### 1) `box + isinst + brtrue/brfalse`  -->  `ldc.i4.0/1 + brtrue/brfalse`
```csharp
public static int Case1<T>(T t)
{
    if (t is int)
        return 42;
    return 0;
}
/*
    IL_0000: ldarg.0
    IL_0001: box !!T
    IL_0006: isinst [System.Private.CoreLib]System.Int32
    IL_000b: brfalse.s IL_0010
    IL_000d: ldc.i4.s 42
    IL_000f: ret
    IL_0010: ldc.i4.0
    IL_0011: ret
*/
```
#### 2) `box + isinst + ldnull + ceq/cgt.un`  -->  `ldc.i4.0/1`
```csharp
public static bool Case2<T>(T t)
{
    return t is int;
}
/*
    IL_0000: ldarg.0
    IL_0001: box !!T
    IL_0006: isinst [System.Private.CoreLib]System.Int32
    IL_000b: ldnull
    IL_000c: cgt.un
    IL_000e: ret
*/
```
#### 3) `box + isinst + unbox.any`  -->  `nop`
```csharp
public static int Case3_1<T>(T o)
{
    if (o is int x)
        return x;
    return 0;
}

// or

public static int Case3_2<T>(T o) => o is int n ? n : 42;

// or

public static int Case3_3<T>(T o)
{
    return o switch
    {
        int n => n,
        string str => str.Length,
        _ => 0
    };
}
```
Tests can be found here: https://github.com/dotnet/runtime/tree/master/src/coreclr/tests/src/JIT/Generics/Conversions/Boxing

[Sharplab.io link](https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8CwAoAAQAYACAZQAsIEAHAGQmADoAlAVwDsY0w4BufPgIBmEgQCMANjEAmEgGF8Ab3wk1Y0ROlouC6HHEAeACoA+ABTGSMAJSr1KvOuck0AMxLmYrqK652nFyCCAHYSABYZAUDgsKJo5wBfezUUjTEpEmAAe2yMPSg4GRMLK1s0xyC1UOsfPxgE9WSYkjSRDO1deX1hAH0jM0sSbIDnSqr3T2y6nW8AD1Gq5xq5xtiSeLTmts1M2YK4PuLBqxGSAF5TYZndDhIAfhI7kAiowRb2rXqDvuESoZGFTSyzC0ygAHc0DAAMYUYFVcZLIL7O6XJ5IeFIjKkWAIC5XXFMOhwDgAcxgFAxLSxal6+I2mJciTWJGaznwiSAA===) to observe IL code for the samples above.

#### Codegen diff example:
```csharp
static bool Test(int n) => Validate(n);

static bool Validate<T>(T t) => t is int;
```
Before (for `Test`):
```asm
0000000000000000	pushq	%r14
0000000000000002	pushq	%rbx
0000000000000003	pushq	%rax
0000000000000004	movl	%edi, %ebx
0000000000000006	movabsq	$0x7fdf9c5090c0, %rax
0000000000000010	movabsq	$0x7fdfad0268e8, %r14
000000000000001a	leaq	0x1be800(%r14), %rdi
0000000000000021	movl	$0x14, %esi
0000000000000026	callq	*(%rax)
0000000000000028	movl	%ebx, 0x10(%rax)
000000000000002b	movq	(%rax), %rax
000000000000002e	cmpq	%r14, (%rax)
0000000000000031	sete	%al
0000000000000034	addq	$0x8, %rsp
0000000000000038	popq	%rbx
0000000000000039	popq	%r14
000000000000003b	retq
```
After:
```asm
0000000000000000	movb	$0x1, %al
0000000000000002	retq
```
EgorBo added a commit to mono/mono that referenced this pull request Mar 11, 2020
This PR optimizes boxing for pattern matching idioms. Basically, it copies existing optimizations from CoreCLR to mono, see [impBoxPatternMatch](https://github.com/dotnet/runtime/blob/084dd1a5aae0fbe6ed4f9164ac4ee1c5f0b2398f/src/coreclr/src/jit/importer.cpp#L5856-L6032) (one of them was recently added by me for coreclr, see dotnet/runtime#1817)
#### 1) `box + isinst + brtrue/brfalse`  -->  `ldc.i4.0/1 + brtrue/brfalse`
```csharp
public static int Case1<T>(T t)
{
    if (t is int)
        return 42;
    return 0;
}
/*
    IL_0000: ldarg.0
    IL_0001: box !!T
    IL_0006: isinst [System.Private.CoreLib]System.Int32
    IL_000b: brfalse.s IL_0010
    IL_000d: ldc.i4.s 42
    IL_000f: ret
    IL_0010: ldc.i4.0
    IL_0011: ret
*/
```
#### 2) `box + isinst + ldnull + ceq/cgt.un`  -->  `ldc.i4.0/1`
```csharp
public static bool Case2<T>(T t)
{
    return t is int;
}
/*
    IL_0000: ldarg.0
    IL_0001: box !!T
    IL_0006: isinst [System.Private.CoreLib]System.Int32
    IL_000b: ldnull
    IL_000c: cgt.un
    IL_000e: ret
*/
```
#### 3) `box + isinst + unbox.any`  -->  `nop`
```csharp
public static int Case3_1<T>(T o)
{
    if (o is int x)
        return x;
    return 0;
}

// or

public static int Case3_2<T>(T o) => o is int n ? n : 42;

// or

public static int Case3_3<T>(T o)
{
    return o switch
    {
        int n => n,
        string str => str.Length,
        _ => 0
    };
}
```
Tests can be found here: https://github.com/dotnet/runtime/tree/master/src/coreclr/tests/src/JIT/Generics/Conversions/Boxing

[Sharplab.io link](https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQxASwDYB8CwAoAAQAYACAZQAsIEAHAGQmADoAlAVwDsY0w4BufPgIBmEgQCMANjEAmEgGF8Ab3wk1Y0ROlouC6HHEAeACoA+ABTGSMAJSr1KvOuck0AMxLmYrqK652nFyCCAHYSABYZAUDgsKJo5wBfezUUjTEpEmAAe2yMPSg4GRMLK1s0xyC1UOsfPxgE9WSYkjSRDO1deX1hAH0jM0sSbIDnSqr3T2y6nW8AD1Gq5xq5xtiSeLTmts1M2YK4PuLBqxGSAF5TYZndDhIAfhI7kAiowRb2rXqDvuESoZGFTSyzC0ygAHc0DAAMYUYFVcZLIL7O6XJ5IeFIjKkWAIC5XXFMOhwDgAcxgFAxLSxal6+I2mJciTWJGaznwiSAA===) to observe IL code for the samples above.

#### Codegen diff example:
```csharp
static bool Test(int n) => Validate(n);

static bool Validate<T>(T t) => t is int;
```
Before (for `Test`):
```asm
0000000000000000	pushq	%r14
0000000000000002	pushq	%rbx
0000000000000003	pushq	%rax
0000000000000004	movl	%edi, %ebx
0000000000000006	movabsq	$0x7fdf9c5090c0, %rax
0000000000000010	movabsq	$0x7fdfad0268e8, %r14
000000000000001a	leaq	0x1be800(%r14), %rdi
0000000000000021	movl	$0x14, %esi
0000000000000026	callq	*(%rax)
0000000000000028	movl	%ebx, 0x10(%rax)
000000000000002b	movq	(%rax), %rax
000000000000002e	cmpq	%r14, (%rax)
0000000000000031	sete	%al
0000000000000034	addq	$0x8, %rsp
0000000000000038	popq	%rbx
0000000000000039	popq	%r14
000000000000003b	retq
```
After:
```asm
0000000000000000	movb	$0x1, %al
0000000000000002	retq
```

Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>
@Sergio0694
Copy link
Contributor

Commenting here following a conversation with @tannergooding on the UWP Community Discord Server, I was wondering whether this PR would also address some codegen other inefficiencies I've noticed or whether I should open a separate issue for them.
Asking here first to double check and avoid creating duplicate issues 😄

object is T, when T is a struct (click to expand)

image

object is T value, when T is a struct (click to expand)

image

object is T, when T is a sealed class (click to expand)

image

object is T value, when T is a sealed class (click to expand)

image

You can see that in all these 4 cases, it's possible to produce better codegen by doing checks manually. I might be missing something, but I think the JIT should be able to understand the context here and perform these optimizations automatically?

Posting just images here for brevity, if you confirm I should open a separate issue to track these I'll write the various C#/asm snippets properly with markdown code blocks of course 😊

Bonus point: case 2) could also use the no.(nullcheck|typecheck) opcode once it's supported, as that unboxing is perfectly safe there, we're already checking everything on our own.

@drieseng
Copy link
Contributor

@Sergio0694 I may be better to create a new issue rather than add a comment to a merged PR.

@EgorBo
Copy link
Member Author

EgorBo commented May 17, 2020

@Sergio0694 not sure, had a quick glance and looks like the main issue here is how JIT rotates comparisons but aside that "slow" and "fast" look the same to me

@Sergio0694
Copy link
Contributor

@EgorBo Thanks for taking a look, appreciate it! 😊

There are mainly two differences I'm seeing in all of them:

  • One less conditional branch in the "fast" version
  • Slightly smaller codegen (this might in part go away if the method is inlined though)

Consider the first example:

image

The "fast" version has a single branch just for the null check, then it simply does a compare with the method table pointer and sets the flag, whereas the other handles each condition separately 🤔

The worst offender would be the second example though, where the (T) checks seem completely redundant, but as mentioned above I guess we'll just have to wait for the runtime to support the no. prefix for that (and that seems like a separate issue as well).

@EgorBo
Copy link
Member Author

EgorBo commented May 17, 2020

@Sergio0694 feel free to file an issue. Ideally, with code, not the screenshots ;-)

@Sergio0694
Copy link
Contributor

@EgorBo Absolutely, will do!

And yeah don't worry, the issue will have actual code, the screens were just to quickly put together this exploratory question to make sure I wasn't asking a dupe of something else 😄

@EgorBo EgorBo deleted the opt-pm-boxing branch May 25, 2020 11:57
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants