-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Method changes behaviour after optimisation #95394
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionApologies if this is not in the right repo. On migrating from .NET 6 to .NET 8 we noticed some failing tests. It would appear that the behaviour of one of our methods changes after tiered compilation. The issue only appears in release and goes away if Reproduction StepsA minimal console app that demonstrates the problem is at: https://github.com/ojb500/optimization-problem Expected behaviorThe results of the method calls should be consistent. Actual behaviorThe method starts to return an incorrect result after a few calls. Regression?Last known to be working with SDK 6.0.417 and runtime 6.0.25. Known WorkaroundsNo response ConfigurationOS: Windows 10 Enterprise 22H2 build 19045.3693 (We also see this on our build servers which have a different configuration)
Other informationNo response
|
Seems to be PGO related? the virtual call looks to be the culprit cc @dotnet/jit-contrib |
@EgorBo are you looking deeper? |
@AndyAyersMS Haven't looked deeply yet, but created a bit more reduced repro: https://gist.github.com/EgorBo/ce16d475e6f77c9c8bbde1d71c9b0552 the bug is in the |
I can't repro this with the reduced version, but I can with the original. |
@AndyAyersMS ah oops, here is the updated version: https://gist.githubusercontent.com/EgorBo/64d758d2322cbfdf81abb75493833444/raw/50fbd94241638c9ca138b570631036e7efd9044b/repro2.cs |
Thanks. Fails with OSR disabled, and since the patchpoint in |
Seems like the bug is in the importer, we need to spill an on-stack copy of a local struct to a temp but we do it after the call that modifies the struct, instead of before. .NET 7 has
and .NET 8 has
With tiered compilation disabled in .NET 8, we don't make the call to Seems like this is only incidentally related to PGO and probably could happen with the right sort of inlineable regular call too. I recall we changed some of the spill analysis in .NET 8 so that might also be relevant. |
Perhaps, #73233 ? |
Looks like it's not, I checked locally |
I think the issue is that when we're importing the call we haven't yet added the return buffer, so we don't realize that the call interferes with the struct on the stack; we only add the return buffer later once we see how the call result is used, in Not sure of the right fix just yet, we could always put the struct return into a new temp for instance and then rely on forward sub to get rid of the temp in most cases, or maybe try and surgically spill before the call (seems tricky), or else defer splitting inline candidates out of their constituent trees. Also not sure when this bug was introduced, it could be from relatively recent changes or it may be something pre-existing that is just exposed now because we're able to do more inlining. |
Looks like this is an old bug that has just now surfaced:
|
Seems like GDV also plays a key role here. Without GDV, we tentatively form an out-of-order looking sequence like:
But if we then inline, this becomes something like
where the assignment happens at the ret-expr location, and if we fail to inline, we move the call itself to the ret-expr position and do a similar assignment.
With GDV though we "hoist" the return value assignments up and we don't get this magic re-ordering. |
Tried this but ran into various complications, and the copies do not go away easily... perhaps fwd sub or something should look for cases where it can de-overlap lifetimes but that doesn't happen now.
Have been looking at this and the mechanical part seems doable, and fixes the particular repro -- basically if we generate any spills of the local we're about to sink into the call, we move all those statements to just before the call (which will be the root of some prior statement). However I am hitting issues with recursive spilling, if we are at the end of a block and need to spill a struct So I may need a prepass to avoid trying to spill in cases like this where there won't be anything to spill. |
If we have a call that returns a struct that is immediately assigned to some local, the call is a GDV inline candidate, and the call is invoked with a copy of the same local on the evaluation stack, the spill of that local into a temp will appear in the IR stream between the call and the ret expr, instead of before the call. As part of our IR resolution the store to the local gets "sunk" into the call as the hidden return buffer, so the importer ordering is manifestly incorrect: ``` call &retbuf, ... tmp = retbuf ...ret-expr ...tmp ``` For normal inline candidates this mis-ordering gets fixed up either by swapping the call back into the ret expr's position, or for successful inlines by swapping the return value store into the ret expr's position. The JIT has behaved this way for a very long time, and the transient mis-ordering has not lead to any noticble problem. For GDV calls the return value stores are done earlier, just after the call, and so the spill picks up the wrong value. GDV calls normally only happen with PGO data. This persistent mis-ordering has been the behavior since at least 6.0, but now that PGO is enabled by default a much wider set of programs are at risk of running into it. The fix here is to reorder the IR in the importer at the point the store to the local is appended to the IR stream, as we are processing spills for the local. If the source of the store is a GT_RET_EXPR we keep track of these spills, find the associated GT_CALL statement, and move the spills before the call. There was a similar fix made for boxes in dotnet#60335, where once again the splitting of the inline candidate call and the subsequent modification of the call to write directly to the return buffer local led to similar problems with GDV calls. Fixes dotnet#95394.
…95539) If we have a call that returns a struct that is immediately assigned to some local, the call is a GDV inline candidate, and the call is invoked with a copy of the same local on the evaluation stack, the spill of that local into a temp will appear in the IR stream between the call and the ret expr, instead of before the call. As part of our IR resolution the store to the local gets "sunk" into the call as the hidden return buffer, so the importer ordering is manifestly incorrect: ``` call &retbuf, ... tmp = retbuf ...ret-expr ...tmp ``` For normal inline candidates this mis-ordering gets fixed up either by swapping the call back into the ret expr's position, or for successful inlines by swapping the return value store into the ret expr's position. The JIT has behaved this way for a very long time, and the transient mis-ordering has not lead to any noticble problem. For GDV calls the return value stores are done earlier, just after the call, and so the spill picks up the wrong value. GDV calls normally only happen with PGO data. This persistent mis-ordering has been the behavior since at least 6.0, but now that PGO is enabled by default a much wider set of programs are at risk of running into it. The fix here is to reorder the IR in the importer at the point the store to the local is appended to the IR stream, as we are processing spills for the local. If the source of the store is a GT_RET_EXPR we keep track of these spills, find the associated GT_CALL statement, and move the spills before the call. There was a similar fix made for boxes in #60335, where once again the splitting of the inline candidate call and the subsequent modification of the call to write directly to the return buffer local led to similar problems with GDV calls. Fixes #95394.
If we have a call that returns a struct that is immediately assigned to some local, the call is a GDV inline candidate, and the call is invoked with a copy of the same local on the evaluation stack, the spill of that local into a temp will appear in the IR stream between the call and the ret expr, instead of before the call. As part of our IR resolution the store to the local gets "sunk" into the call as the hidden return buffer, so the importer ordering is manifestly incorrect: ``` call &retbuf, ... tmp = retbuf ...ret-expr ...tmp ``` For normal inline candidates this mis-ordering gets fixed up either by swapping the call back into the ret expr's position, or for successful inlines by swapping the return value store into the ret expr's position. The JIT has behaved this way for a very long time, and the transient mis-ordering has not lead to any noticble problem. For GDV calls the return value stores are done earlier, just after the call, and so the spill picks up the wrong value. GDV calls normally only happen with PGO data. This persistent mis-ordering has been the behavior since at least 6.0, but now that PGO is enabled by default a much wider set of programs are at risk of running into it. The fix here is to reorder the IR in the importer at the point the store to the local is appended to the IR stream, as we are processing spills for the local. If the source of the store is a GT_RET_EXPR we keep track of these spills, find the associated GT_CALL statement, and move the spills before the call. There was a similar fix made for boxes in #60335, where once again the splitting of the inline candidate call and the subsequent modification of the call to write directly to the return buffer local led to similar problems with GDV calls. Fixes #95394.
@ojb500 thank you for reporting this. It is fixed in our mainline and I have requested the fix to be ported back to 8.0. Best guess is that this would show up in a February servicing release. I will keep you posted on its progress. |
Thank you for the speedy resolution and all the great work on .NET generally in recent years 👍 |
…95587) If we have a call that returns a struct that is immediately assigned to some local, the call is a GDV inline candidate, and the call is invoked with a copy of the same local on the evaluation stack, the spill of that local into a temp will appear in the IR stream between the call and the ret expr, instead of before the call. As part of our IR resolution the store to the local gets "sunk" into the call as the hidden return buffer, so the importer ordering is manifestly incorrect: ``` call &retbuf, ... tmp = retbuf ...ret-expr ...tmp ``` For normal inline candidates this mis-ordering gets fixed up either by swapping the call back into the ret expr's position, or for successful inlines by swapping the return value store into the ret expr's position. The JIT has behaved this way for a very long time, and the transient mis-ordering has not lead to any noticble problem. For GDV calls the return value stores are done earlier, just after the call, and so the spill picks up the wrong value. GDV calls normally only happen with PGO data. This persistent mis-ordering has been the behavior since at least 6.0, but now that PGO is enabled by default a much wider set of programs are at risk of running into it. The fix here is to reorder the IR in the importer at the point the store to the local is appended to the IR stream, as we are processing spills for the local. If the source of the store is a GT_RET_EXPR we keep track of these spills, find the associated GT_CALL statement, and move the spills before the call. There was a similar fix made for boxes in #60335, where once again the splitting of the inline candidate call and the subsequent modification of the call to write directly to the return buffer local led to similar problems with GDV calls. Fixes #95394. Co-authored-by: Andy Ayers <andya@microsoft.com>
Description
Apologies if this is not in the right repo.
On migrating from .NET 6 to .NET 8 we noticed some failing tests.
It would appear that the behaviour of one of our methods changes after tiered compilation.
The issue only appears in release and goes away if
MethodImpl.NoOptimization
is applied to the method or<TieredCompilation>false</TieredCompilation>
is set in the project file.Reproduction Steps
A minimal console app that demonstrates the problem is at: https://github.com/ojb500/optimization-problem
Expected behavior
The results of the method calls should be consistent.
Actual behavior
The method starts to return an incorrect result after a few calls.
Regression?
Last known to be working with SDK 6.0.417 and runtime 6.0.25.
Known Workarounds
No response
Configuration
OS: Windows 10 Enterprise 22H2 build 19045.3693
win-x64
(We also see this on our build servers which have a different configuration)
Other information
No response
The text was updated successfully, but these errors were encountered: