Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method changes behaviour after optimisation #95394

Closed
ojb500 opened this issue Nov 29, 2023 · 16 comments · Fixed by #95539
Closed

Method changes behaviour after optimisation #95394

ojb500 opened this issue Nov 29, 2023 · 16 comments · Fixed by #95539
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@ojb500
Copy link

ojb500 commented Nov 29, 2023

Description

Apologies if this is not in the right repo.

On migrating from .NET 6 to .NET 8 we noticed some failing tests.

It would appear that the behaviour of one of our methods changes after tiered compilation.

The issue only appears in release and goes away if MethodImpl.NoOptimization is applied to the method or <TieredCompilation>false</TieredCompilation> is set in the project file.

Reproduction Steps

A minimal console app that demonstrates the problem is at: https://github.com/ojb500/optimization-problem

Expected behavior

The results of the method calls should be consistent.

Actual behavior

The method starts to return an incorrect result after a few calls.

Regression?

Last known to be working with SDK 6.0.417 and runtime 6.0.25.

Known Workarounds

No response

Configuration

OS: Windows 10 Enterprise 22H2 build 19045.3693
win-x64

(We also see this on our build servers which have a different configuration)

❯ dotnet --list-sdks
6.0.417 [C:\Program Files\dotnet\sdk]
8.0.100 [C:\Program Files\dotnet\sdk]
❯ dotnet --list-runtimes
Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 6.0.15 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other information

No response

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 29, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 29, 2023
@ojb500 ojb500 changed the title Optimised Method changes behaviour after optimisation Nov 29, 2023
@EgorBo EgorBo added bug area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 29, 2023
@ghost
Copy link

ghost commented Nov 29, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Apologies if this is not in the right repo.

On migrating from .NET 6 to .NET 8 we noticed some failing tests.

It would appear that the behaviour of one of our methods changes after tiered compilation.

The issue only appears in release and goes away if MethodImpl.NoOptimization is applied to the method or <TieredCompilation>false</TieredCompilation> is set in the project file.

Reproduction Steps

A minimal console app that demonstrates the problem is at: https://github.com/ojb500/optimization-problem

Expected behavior

The results of the method calls should be consistent.

Actual behavior

The method starts to return an incorrect result after a few calls.

Regression?

Last known to be working with SDK 6.0.417 and runtime 6.0.25.

Known Workarounds

No response

Configuration

OS: Windows 10 Enterprise 22H2 build 19045.3693
win-x64

(We also see this on our build servers which have a different configuration)

❯ dotnet --list-sdks
6.0.417 [C:\Program Files\dotnet\sdk]
8.0.100 [C:\Program Files\dotnet\sdk]
❯ dotnet --list-runtimes
Microsoft.AspNetCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
Microsoft.NETCore.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.15 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.NETCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
Microsoft.WindowsDesktop.App 3.1.32 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 6.0.15 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Microsoft.WindowsDesktop.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other information

No response

Author: ojb500
Assignees: -
Labels:

bug, area-CodeGen-coreclr, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Nov 29, 2023

Seems to be PGO related? the virtual call looks to be the culprit

cc @dotnet/jit-contrib

@EgorBo EgorBo added this to the 9.0.0 milestone Nov 29, 2023
@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Nov 29, 2023
@AndyAyersMS
Copy link
Member

@EgorBo are you looking deeper?

@EgorBo
Copy link
Member

EgorBo commented Nov 29, 2023

@EgorBo are you looking deeper?

@AndyAyersMS Haven't looked deeply yet, but created a bit more reduced repro: https://gist.github.com/EgorBo/ce16d475e6f77c9c8bbde1d71c9b0552 the bug is in the Test method (different behaviour between Tier1-PGO and other tiers).

@AndyAyersMS
Copy link
Member

I can't repro this with the reduced version, but I can with the original.

@EgorBo
Copy link
Member

EgorBo commented Nov 29, 2023

I can't repro this with the reduced version, but I can with the original.

@AndyAyersMS ah oops, here is the updated version: https://gist.githubusercontent.com/EgorBo/64d758d2322cbfdf81abb75493833444/raw/50fbd94241638c9ca138b570631036e7efd9044b/repro2.cs

@AndyAyersMS
Copy link
Member

Thanks.

Fails with OSR disabled, and since the patchpoint in Test is at offset 0 the OSR and TIer1 codegen are very similar, save for prolog/epilog.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Nov 29, 2023

Seems like the bug is in the importer, we need to spill an on-stack copy of a local struct to a temp but we do it after the call that modifies the struct, instead of before.

.NET 7 has

***** BB03
STMT00002 ( 0x00D[E-] ... 0x013 )
               [000010] S-C-G------                         *  CALL      void   Plane.ECurrent
               [000012] ----------- retbuf                  +--*  ADDR      byref 
               [000011] -------N---                         |  \--*  LCL_VAR   struct<Point, 16> V03 arg3         
               [000009] ----------- arg1                    \--*  LCL_VAR   ref    V02 arg2         

***** BB03
STMT00003 ( 0x015[E-] ... 0x01E )
               [000022] -A---------                         *  ASG       struct (copy)
               [000020] D------N---                         +--*  LCL_VAR   struct<System.ValueTuple`2[System.Double,System.Double], 16> V10 tmp1         
               [000013] -----------                         \--*  LCL_VAR   struct<System.ValueTuple`2[System.Double,System.Double], 16> V04 arg4         

***** BB03
STMT00004 ( ??? ... ??? )
               [000016] S-C-G------                         *  CALLV vt-ind void   Plane.XY
               [000014] ----------- this                    +--*  LCL_VAR   ref    V00 this         
               [000024] ----------- retbuf                  +--*  ADDR      byref 
               [000019] -------N---                         |  \--*  LCL_VAR   struct<System.ValueTuple`2[System.Double,System.Double], 16> V04 arg4         
               [000018] n---------- arg2                    \--*  OBJ       struct<Point, 16>
               [000017] -----------                            \--*  ADDR      byref 
               [000015] -------N---                               \--*  LCL_VAR   struct<Point, 16> V03 arg3   

***** BB03
STMT00005 ( ??? ... 0x025 )
               [000028] -A---------                         *  ASG       double
               [000027] D------N---                         +--*  LCL_VAR   double V08 loc1         
               [000026] -----------                         \--*  FIELD     double Item2
               [000025] -----------                            \--*  ADDR      byref 
               [000023] -------N---                               \--*  LCL_VAR   struct<System.ValueTuple`2[System.Double,System.Double], 16> V10 tmp1         

and .NET 8 has

***** BB03
STMT00002 ( 0x00D[E-] ... 0x013 )
               [000008] S-C-G------                         *  CALL      void   Plane:ECurrent(System.Collections.Generic.IEnumerator`1[Point]):Point
               [000010] ----------- retbuf                  +--*  LCL_ADDR  byref  V03 arg3         [+0]
               [000007] ----------- arg1                    \--*  LCL_VAR   ref    V02 arg2         

***** BB03
STMT00003 ( 0x015[E-] ... 0x01E )
               [000014] &-C-G------                         *  CALLV vt-ind void   Plane:XY(Point):System.ValueTuple`2[double,double]:this (exactContextHnd=0x00007FFBDBCE1C21)
               [000012] ----------- this                    +--*  LCL_VAR   ref    V00 this         
               [000017] ----------- retbuf                  +--*  LCL_ADDR  byref  V04 arg4         [+0]
               [000013] ----------- arg2                    \--*  LCL_VAR   struct<Point, 16> V03 arg3         

***** BB03
STMT00005 ( 0x015[E-] ... ??? )
               [000018] DA---------                         *  STORE_LCL_VAR struct<System.ValueTuple`2, 16> V10 tmp1         
               [000011] -----------                         \--*  LCL_VAR   struct<System.ValueTuple`2, 16> V04 arg4         

***** BB03
STMT00004 ( 0x015[E-] ... ??? )
               [000015] --C--------                         *  RET_EXPR  void  (for [000014])

***** BB03
STMT00006 ( ??? ... 0x025 )
               [000023] DA---------                         *  STORE_LCL_VAR double V08 loc1         
               [000022] n----------                         \--*  IND       double
               [000021] -----------                            \--*  FIELD_ADDR byref  System.ValueTuple`2[double,double]:Item2
               [000020] -----------                               \--*  LCL_ADDR  byref  V10 tmp1         [+0]

With tiered compilation disabled in .NET 8, we don't make the call to Plane:XY an inline candidate so the spill ends up before the call. So the bug is something like: when we split the tree for the inline candidate we don't realize the spill has to go with the call and not with the return.

Seems like this is only incidentally related to PGO and probably could happen with the right sort of inlineable regular call too. I recall we changed some of the spill analysis in .NET 8 so that might also be relevant.

@EgorBo
Copy link
Member

EgorBo commented Nov 29, 2023

I recall we changed some of the spill analysis in .NET 8 so that might also be relevant.

Perhaps, #73233 ?

@EgorBo
Copy link
Member

EgorBo commented Nov 29, 2023

I recall we changed some of the spill analysis in .NET 8 so that might also be relevant.

Perhaps, #73233 ?

Looks like it's not, I checked locally

@AndyAyersMS
Copy link
Member

I think the issue is that when we're importing the call we haven't yet added the return buffer, so we don't realize that the call interferes with the struct on the stack; we only add the return buffer later once we see how the call result is used, in impStoreStruct, and we spill then since we now see the interference, but we've already appended the call, since it was an inline candidate. So the spill ends up after the call instead of before.

Not sure of the right fix just yet, we could always put the struct return into a new temp for instance and then rely on forward sub to get rid of the temp in most cases, or maybe try and surgically spill before the call (seems tricky), or else defer splitting inline candidates out of their constituent trees.

Also not sure when this bug was introduced, it could be from relatively recent changes or it may be something pre-existing that is just exposed now because we're able to do more inlining.

@AndyAyersMS
Copy link
Member

Looks like this is an old bug that has just now surfaced:

  • This repros in 7.0 with TieredPGO=1.
  • This repros in 6.0 with TieredPGO=1, TC_QuickJitForLoops=1
  • This also repros in main

@AndyAyersMS
Copy link
Member

Seems like GDV also plays a key role here. Without GDV, we tentatively form an out-of-order looking sequence like:

t = call(...);
temp = t;
ret-expr

But if we then inline, this becomes something like

q = ...  (inline stuff, into return value temp)
temp = t;
t = q;  

where the assignment happens at the ret-expr location, and if we fail to inline, we move the call itself to the ret-expr position and do a similar assignment.

temp = t;
t = call(...);

With GDV though we "hoist" the return value assignments up and we don't get this magic re-ordering.

@AndyAyersMS
Copy link
Member

we could always put the struct return into a new temp for instance

Tried this but ran into various complications, and the copies do not go away easily... perhaps fwd sub or something should look for cases where it can de-overlap lifetimes but that doesn't happen now.

try and surgically spill before the call (seems tricky),

Have been looking at this and the mechanical part seems doable, and fixes the particular repro -- basically if we generate any spills of the local we're about to sink into the call, we move all those statements to just before the call (which will be the root of some prior statement).

However I am hitting issues with recursive spilling, if we are at the end of a block and need to spill a struct GT_RET_EXPR to a spill temp we form a store to the temp and then invoke impStoreStruct which sees the store is to a temp and tries to spill it; the second spill should not do anything as that temp won't be on the eval stack, but the spill code is blocked on entry by the recursion guard.

So I may need a prepass to avoid trying to spill in cases like this where there won't be anything to spill.

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Dec 2, 2023
If we have a call that returns a struct that is immediately assigned
to some local, the call is a GDV inline candidate, and the call is
invoked with a copy of the same local on the evaluation stack, the
spill of that local into a temp will appear in the IR stream between
the call and the ret expr, instead of before the call.

As part of our IR resolution the store to the local gets "sunk" into
the call as the hidden return buffer, so the importer ordering is
manifestly incorrect:
```
call &retbuf, ...
tmp = retbuf
...ret-expr
...tmp
```

For normal inline candidates this mis-ordering gets fixed up either
by swapping the call back into the ret expr's position, or for successful
inlines by swapping the return value store into the ret expr's position.
The JIT has behaved this way for a very long time, and the transient
mis-ordering has not lead to any noticble problem.

For GDV calls the return value stores are done earlier, just after
the call, and so the spill picks up the wrong value. GDV calls normally
only happen with PGO data. This persistent mis-ordering has been
the behavior since at least 6.0, but now that PGO is enabled by default
a much wider set of programs are at risk of running into it.

The fix here is to reorder the IR in the importer at the point the
store to the local is appended to the IR stream, as we are processing
spills for the local. If the source of the store is a GT_RET_EXPR we
keep track of these spills, find the associated GT_CALL statement, and
move the spills before the call.

There was a similar fix made for boxes in dotnet#60335, where once again the
splitting of the inline candidate call and the subsequent modification of
the call to write directly to the return buffer local led to similar
problems with GDV calls.

Fixes dotnet#95394.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 2, 2023
AndyAyersMS added a commit that referenced this issue Dec 4, 2023
…95539)

If we have a call that returns a struct that is immediately assigned
to some local, the call is a GDV inline candidate, and the call is
invoked with a copy of the same local on the evaluation stack, the
spill of that local into a temp will appear in the IR stream between
the call and the ret expr, instead of before the call.

As part of our IR resolution the store to the local gets "sunk" into
the call as the hidden return buffer, so the importer ordering is
manifestly incorrect:
```
call &retbuf, ...
tmp = retbuf
...ret-expr
...tmp
```

For normal inline candidates this mis-ordering gets fixed up either
by swapping the call back into the ret expr's position, or for successful
inlines by swapping the return value store into the ret expr's position.
The JIT has behaved this way for a very long time, and the transient
mis-ordering has not lead to any noticble problem.

For GDV calls the return value stores are done earlier, just after
the call, and so the spill picks up the wrong value. GDV calls normally
only happen with PGO data. This persistent mis-ordering has been
the behavior since at least 6.0, but now that PGO is enabled by default
a much wider set of programs are at risk of running into it.

The fix here is to reorder the IR in the importer at the point the
store to the local is appended to the IR stream, as we are processing
spills for the local. If the source of the store is a GT_RET_EXPR we
keep track of these spills, find the associated GT_CALL statement, and
move the spills before the call.

There was a similar fix made for boxes in #60335, where once again the
splitting of the inline candidate call and the subsequent modification of
the call to write directly to the return buffer local led to similar
problems with GDV calls.

Fixes #95394.
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 4, 2023
github-actions bot pushed a commit that referenced this issue Dec 4, 2023
If we have a call that returns a struct that is immediately assigned
to some local, the call is a GDV inline candidate, and the call is
invoked with a copy of the same local on the evaluation stack, the
spill of that local into a temp will appear in the IR stream between
the call and the ret expr, instead of before the call.

As part of our IR resolution the store to the local gets "sunk" into
the call as the hidden return buffer, so the importer ordering is
manifestly incorrect:
```
call &retbuf, ...
tmp = retbuf
...ret-expr
...tmp
```

For normal inline candidates this mis-ordering gets fixed up either
by swapping the call back into the ret expr's position, or for successful
inlines by swapping the return value store into the ret expr's position.
The JIT has behaved this way for a very long time, and the transient
mis-ordering has not lead to any noticble problem.

For GDV calls the return value stores are done earlier, just after
the call, and so the spill picks up the wrong value. GDV calls normally
only happen with PGO data. This persistent mis-ordering has been
the behavior since at least 6.0, but now that PGO is enabled by default
a much wider set of programs are at risk of running into it.

The fix here is to reorder the IR in the importer at the point the
store to the local is appended to the IR stream, as we are processing
spills for the local. If the source of the store is a GT_RET_EXPR we
keep track of these spills, find the associated GT_CALL statement, and
move the spills before the call.

There was a similar fix made for boxes in #60335, where once again the
splitting of the inline candidate call and the subsequent modification of
the call to write directly to the return buffer local led to similar
problems with GDV calls.

Fixes #95394.
@AndyAyersMS
Copy link
Member

@ojb500 thank you for reporting this.

It is fixed in our mainline and I have requested the fix to be ported back to 8.0. Best guess is that this would show up in a February servicing release. I will keep you posted on its progress.

@ojb500
Copy link
Author

ojb500 commented Dec 4, 2023

Thank you for the speedy resolution and all the great work on .NET generally in recent years 👍

jeffschwMSFT pushed a commit that referenced this issue Jan 2, 2024
…95587)

If we have a call that returns a struct that is immediately assigned
to some local, the call is a GDV inline candidate, and the call is
invoked with a copy of the same local on the evaluation stack, the
spill of that local into a temp will appear in the IR stream between
the call and the ret expr, instead of before the call.

As part of our IR resolution the store to the local gets "sunk" into
the call as the hidden return buffer, so the importer ordering is
manifestly incorrect:
```
call &retbuf, ...
tmp = retbuf
...ret-expr
...tmp
```

For normal inline candidates this mis-ordering gets fixed up either
by swapping the call back into the ret expr's position, or for successful
inlines by swapping the return value store into the ret expr's position.
The JIT has behaved this way for a very long time, and the transient
mis-ordering has not lead to any noticble problem.

For GDV calls the return value stores are done earlier, just after
the call, and so the spill picks up the wrong value. GDV calls normally
only happen with PGO data. This persistent mis-ordering has been
the behavior since at least 6.0, but now that PGO is enabled by default
a much wider set of programs are at risk of running into it.

The fix here is to reorder the IR in the importer at the point the
store to the local is appended to the IR stream, as we are processing
spills for the local. If the source of the store is a GT_RET_EXPR we
keep track of these spills, find the associated GT_CALL statement, and
move the spills before the call.

There was a similar fix made for boxes in #60335, where once again the
splitting of the inline candidate call and the subsequent modification of
the call to write directly to the return buffer local led to similar
problems with GDV calls.

Fixes #95394.

Co-authored-by: Andy Ayers <andya@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators Jan 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants