-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Do not encode safe points with -1 offset. (#104336)" #105234
Conversation
This reverts commit fa77959.
/azp run runtime-coreclr gcstress-extra |
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Azure Pipelines successfully started running 1 pipeline(s). |
cc @dotnet/jit-contrib @VSadov @jkotas It looks like reverting this commit fixes the gcstress-extra issues that were filed this morning. |
It is very possible that JIT stress does something illegal. I'd prefer reverting though, if that fixes up the stress runs. There are quite a few GCStress failures. Is that after removing or happens regardless?
|
Those are predominantly #105186 |
We had some green gcstress runs just recently (Ex. https://dev.azure.com/dnceng-public/public/_build/results?buildId=732442&view=results), so I guess these are new failures not related to safepoint encoding. |
Also note that the change that we are reverting could make #103950 a bit easier, so we might want to figure what is wrong with jit stress and put the change back. |
This seems to be causing https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_apis/build/builds/746357/logs/28 which I'd say is a product issue. Verified by using the commit before the original change and not having any issues, then using the original change and having issues. |
@VSadov, I was helping @BrennanConroy yesterday to debug the issue and I have a dump that he's shared with stress log covering the whole run, where their app crashed in 2.5 seconds after start, and only a single GC was executed. Sounds like this might be the easiest way for you to figure out what's wrong.
|
This looks like it's saving a callee-save register because the method is going to be using it. If it has a GC reference I assume the caller is what needs to be marking that. |
Of course you are right and I was wrong. |
This is the method that keeps the reference in RBX and calls the method above and also the method that crashes with the invalid reference:
|
And here is the call stack at the time of the crash:
|
does the above mean that rbx is not marked live going into the call, but marked live right after? Discontinuity in nonvolatile register liveness is possible if the call throws, but then we'd have unreachable int3 after a call. If this method was forced to be fully interruptible and was a leaf of a stackwalk, we could start from either of the locations (before call, after). Would both work correctly? |
I am actually not sure how to read the GC info here. The fact that it has +rbx at two places, but no -rbx anywhere confuses me. |
@VSadov this is the corresponding output of the !gcinfo if that helps you:
|
One interesting thing is that when I ran https://dev.azure.com/dnceng-public/public/_build/results?buildId=729833&view=results |
The reason for @BrennanConroy ' s failures appear to be just cross-build breaking change nature of the change. The new codegen was incompatible with existing R2R images. We did update R2R min version for 9.0, but we do not version between builds. Disabling R2R temporarily for these tests while old R2R images might still be used would very likely fix the problem. |
I think we still have some real issue with JIT stress on arm/arm64 though. |
This reverts #104336, which appears to fix the gcstress-extra issues filed this morning.
Close #105210
Close #105212
Close #105213
Close #105214
Close #105215
Close #105216
Close #105218
Close #105219
Close #105220
Close #105221
Close #105222
Close #105223
Close #105224
Close #105225
Close #105226
Close #105227