-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assert failure: CheckInstrBytePattern(epilogBase[offset] & X86_INSTR_RET, X86_INSTR_RET, epilogBase[offset]) || CheckInstrBytePattern(epilogBase[offset], X86_INSTR_JMP_NEAR_REL32, epilogBase[offset]) || CheckInstrWord(*PTR_WORD(epilogBase + offset), X86_INSTR_w_JMP_FAR_IND_IMM) #68431
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsI see a failure in baseservices\threading\generics\Monitor\EnterExit12\EnterExit12.dll if run under GCStress=C on x86 in a loop, within 10 or so iterations:
with script:
I see this show up in the CI, but not frequently. E.g., This assert was hit before: #12953 In the debugger, it looks like we're trying to unwind from the first instruction of the prolog, which is a gcstress
so maybe the unwind code doesn't expect to see
|
Seems likely. I don't see how we could get this far into |
From the dump I got from the same issue in #70968, I can see that there are two threads running GC stress in the same function. One invoked the vectored exception handler on the runtime/src/coreclr/vm/gccover.cpp Lines 1549 to 1553 in f9999cc
|
@janvorli do you know why the race is only a problem on x86? |
The race here seems to be one that is hinted at here: runtime/src/coreclr/vm/gccover.cpp Lines 1665 to 1669 in 8c2ef7f
As @jkotas has explained to me, for partially interruptible methods there is a mismatch between where GC stress runs GCs and where normal return address hijacking would run GCs. GC stress runs the GC on the first instruction after a call returns, where there for partially interruptible methods is no GC info that says to protect GC pointers in return registers. It means GC stress needs to do some special work to figure out that these need to be protected. The way it does that is the following:
|
For partially interruptible methods there is a mismatch between where GC stress runs GCs and where normal return address hijacking would run GCs. GC stress runs the GC on the first instruction after a call returns, where there for partially interruptible methods is no GC info that says to protect GC pointers in return registers. It means GC stress needs to do some special work to figure out that these need to be protected. The way it does that is the following: * For direct call sites, in GCCoverageInfo::SprinkleBreakpoints it gets the target MD of each call site and places a special illegal instruction right after the call that the GC stress handler will use to figure out which (if any) registers have GC pointers that need protection * For indirect call sites, in GCCoverageInfo::SprinkleBreakpoint it will first place an illegal instruction on the call instruction so that the GC stress handler will break there. Once the GC stress handler breaks, it computes the target address and gets the target MD from that, and then places the illegal instruction on the next instruction like above. GCCoverageInfo::SprinkleBreakpoints runs right after jitting and should therefore not race with anything. However, the GC stress handler can race with any other thread accessing the same function. That makes the latter problematic on x86 where unwinding reads the epilog code to work. In this particular case thread A is about to unwind through the epilog when thread B stops on an indirect call right before the epilog. Thread B then overwrites the first instruction of the epilog, causing thread A to unwind incorrectly. UnwindStackFrame already has access to the GC stress saved code and is actually already using it for other unwinding. To fix the issue, make it use the saved code for unwinding epilogs as well. Fix dotnet#68431
For partially interruptible methods there is a mismatch between where GC stress runs GCs and where normal return address hijacking would run GCs. GC stress runs the GC on the first instruction after a call returns, where there for partially interruptible methods is no GC info that says to protect GC pointers in return registers. It means GC stress needs to do some special work to figure out that these need to be protected. The way it does that is the following: * For direct call sites, in GCCoverageInfo::SprinkleBreakpoints it gets the target MD of each call site and places a special illegal instruction right after the call that the GC stress handler will use to figure out which (if any) registers have GC pointers that need protection * For indirect call sites, in GCCoverageInfo::SprinkleBreakpoint it will first place an illegal instruction on the call instruction so that the GC stress handler will break there. Once the GC stress handler breaks, it computes the target address and gets the target MD from that, and then places the illegal instruction on the next instruction like above. GCCoverageInfo::SprinkleBreakpoints runs right after jitting and should therefore not race with anything. However, the GC stress handler can race with any other thread accessing the same function. That makes the latter problematic on x86 where unwinding reads the epilog code to work. In this particular case thread A is about to unwind through the epilog when thread B stops on an indirect call right before the epilog. Thread B then overwrites the first instruction of the epilog, causing thread A to unwind incorrectly. UnwindStackFrame already has access to the GC stress saved code and is actually already using it for other unwinding. To fix the issue, make it use the saved code for unwinding epilogs as well. Fix #68431
I see a failure in baseservices\threading\generics\Monitor\EnterExit12\EnterExit12.dll if run under GCStress=C on x86 in a loop, within 10 or so iterations:
with script:
I see this show up in the CI, but not frequently. E.g.,
https://dev.azure.com/dnceng/public/_build/results?buildId=1683920&view=ms.vss-test-web.build-test-results-tab&runId=46090658&resultId=101234&paneView=debug
This assert was hit before: #12953
In the debugger, it looks like we're trying to unwind from the first instruction of the prolog, which is a gcstress
hlt
instruction:so maybe the unwind code doesn't expect to see
hlt
?@AndyAyersMS @janvorli
The text was updated successfully, but these errors were encountered: