Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Runtime_82535 crashes on osx x64 #89585

Closed
markples opened this issue Jul 27, 2023 · 7 comments · Fixed by #90967
Closed

JIT: Runtime_82535 crashes on osx x64 #89585

markples opened this issue Jul 27, 2023 · 7 comments · Fixed by #90967

Comments

@markples
Copy link
Member

Failure was on osx x64, but the original issue suggests that it could fail on multiple platforms and architectures.

#82663 added test JIT/Regression/JitBlue/Runtime_82535, which failed under mono and coreclr. mono was fixed in that PR. coreclr stopped failing, but it appears to be because the test stopped running. #89521 fixes the issue that stopped it from running and exposes the failure again.

07:07:35.703 Running test: JIT/Regression/JitBlue/Runtime_82535/Runtime_82535/Runtime_82535.dll
[createdump] Invalid process id: task_for_pid(986) FAILED (os/kern) failure (5)
[createdump] This failure may be because createdump or the application is not properly signed and entitled.
[createdump] Target process is alive
[createdump] Failure took 16ms
waitpid() returned successfully (wstatus 0000ff00) WEXITSTATUS ff WTERMSIG 0
App Exit Code: 22
Expected: 100
Actual: 22
END EXECUTION - FAILED
@markples markples added os-mac-os-x macOS aka OSX arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Jul 27, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 27, 2023
@ghost
Copy link

ghost commented Jul 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Failure was on osx x64, but the original issue suggests that it could fail on multiple platforms and architectures.

#82663 added test JIT/Regression/JitBlue/Runtime_82535, which failed under mono and coreclr. mono was fixed in that PR. coreclr stopped failing, but it appears to be because the test stopped running. #89521 fixes the issue that stopped it from running and exposes the failure again.

07:07:35.703 Running test: JIT/Regression/JitBlue/Runtime_82535/Runtime_82535/Runtime_82535.dll
[createdump] Invalid process id: task_for_pid(986) FAILED (os/kern) failure (5)
[createdump] This failure may be because createdump or the application is not properly signed and entitled.
[createdump] Target process is alive
[createdump] Failure took 16ms
waitpid() returned successfully (wstatus 0000ff00) WEXITSTATUS ff WTERMSIG 0
App Exit Code: 22
Expected: 100
Actual: 22
END EXECUTION - FAILED
Author: markples
Assignees: -
Labels:

os-mac-os-x, arch-x64, area-CodeGen-coreclr

Milestone: -

@JulieLeeMSFT JulieLeeMSFT added this to the 8.0.0 milestone Jul 28, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 28, 2023
@JulieLeeMSFT JulieLeeMSFT added the Priority:2 Work that is important, but not critical for the release label Jul 31, 2023
@JulieLeeMSFT JulieLeeMSFT assigned BruceForstall and unassigned TIHan Aug 7, 2023
@BruceForstall
Copy link
Member

Test PR re-enabling the test: #90290

@BruceForstall
Copy link
Member

The test PR verified it still fails.

@BruceForstall
Copy link
Member

This test hits a null reference exception inside a write barrier helper (as part of copying a struct). The VM then hits an exception during the processing of this:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000102fcd977 libcoreclr.dylib`EECodeInfo::GetFunctionEntry() + 39
    frame #1: 0x000000010330c34d libcoreclr.dylib`UnwindManagedExceptionPass1(PAL_SEHException&, _CONTEXT*) + 285
    frame #2: 0x000000010330cecd libcoreclr.dylib`DispatchManagedException(PAL_SEHException&, bool) + 541
    frame #3: 0x0000000103300061 libcoreclr.dylib`HandleHardwareException(PAL_SEHException*) + 1105
    frame #4: 0x0000000102dc68fd libcoreclr.dylib`SEHProcessException(PAL_SEHException*) + 253
    frame #5: 0x0000000102e5e5f1 libcoreclr.dylib`PAL_DispatchExceptionInner(_CONTEXT*, _EXCEPTION_RECORD*) + 289
    frame #6: 0x0000000102e5e48a libcoreclr.dylib`PAL_DispatchException + 74
    frame #7: 0x0000000102e5d914 libcoreclr.dylib`PAL_DispatchExceptionWrapper + 10
    frame #8: 0x00000001034507b0 libcoreclr.dylib`UpdateCardBundle_ByRefWriteBarrier + 16
    frame #9: 0x000000011d7e4ff7
    frame #10: 0x000000011d7e4e84
    frame #11: 0x00000001034500a1 libcoreclr.dylib`CallDescrWorkerInternal + 124
    frame #12: 0x00000001030ee7cb libcoreclr.dylib`CallDescrWorkerWithHandler(CallDescrData*, int) + 315
    frame #13: 0x00000001030ef3fe libcoreclr.dylib`MethodDescCallSite::CallTargetWorker(unsigned long long const*, unsigned long long*, int) + 2174
    frame #14: 0x0000000102ea08d2 libcoreclr.dylib`MethodDescCallSite::Call_RetArgSlot(unsigned long long const*) + 162
    frame #15: 0x0000000102ea06ab libcoreclr.dylib`RunMainInternal(Param*) + 667
    frame #16: 0x0000000102ea03e9 libcoreclr.dylib`RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::$_1::operator()(Param*) const::'lambda'(Param*)::operator()(Param*) const + 25
    frame #17: 0x0000000102e9b786 libcoreclr.dylib`RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::$_1::operator()(Param*) const + 70
    frame #18: 0x0000000102e9b565 libcoreclr.dylib`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) + 453
    frame #19: 0x0000000102e9bb91 libcoreclr.dylib`Assembly::ExecuteMainMethod(REF<PtrArray>*, int) + 529
    frame #20: 0x0000000102f0efbc libcoreclr.dylib`CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 2044
    frame #21: 0x0000000102e686d6 libcoreclr.dylib`coreclr_execute_assembly + 342
    frame #22: 0x00000001000054ad corerun`run(config=0x00007ff7bfeff1d8) at corerun.cpp:429:18
    frame #23: 0x0000000100002684 corerun`main(argc=2, argv=0x00007ff7bfeff548) at corerun.cpp:624:21
    frame #24: 0x00007ff8137f241f dyld`start + 1903

In this case, we're calling LazyGetFunctionEntry:

PTR_RUNTIME_FUNCTION EECodeInfo::GetFunctionEntry()
{
LIMITED_METHOD_CONTRACT;
SUPPORTS_DAC;
if (m_pFunctionEntry == NULL)
m_pFunctionEntry = m_pJM->LazyGetFunctionEntry(this);
return m_pFunctionEntry;
}

and m_pJM is null.

@BruceForstall BruceForstall removed their assignment Aug 10, 2023
@BruceForstall BruceForstall added area-ExceptionHandling-coreclr untriaged New issue has not been triaged by the area owner and removed area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release labels Aug 10, 2023
@BruceForstall
Copy link
Member

@mangod9 This looks like an exception handling bug in the VM.

@jeffschwMSFT jeffschwMSFT removed the untriaged New issue has not been triaged by the area owner label Aug 10, 2023
@janvorli
Copy link
Member

Based on the stack trace above, I think the issue is caused by the fact that the labels inside of JIT_ByRefWriteBarrier are not wrapped in LOCAL_LABEL macro. The libunwind then doesn't find correct unwind info for the whole function, it thinks that the closest label before the current ip is the actual function start. I was hiting similar issues during other work on macOS x64.
I am working on a fix.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 22, 2023
janvorli added a commit to janvorli/runtime that referenced this issue Aug 22, 2023
Local labels in the JIT_ByRefWriteBarrier were not wrapped in the
LOCAL_LABEL macro. That causes incorrect unwinding in case an
access violation occurs inside of this function. The closest label
before the failure point is considered to be the actual function,
but it obviously doesn't have the right unwind info.

Close dotnet#89585
janvorli added a commit that referenced this issue Aug 23, 2023
Local labels in the JIT_ByRefWriteBarrier were not wrapped in the
LOCAL_LABEL macro. That causes incorrect unwinding in case an
access violation occurs inside of this function. The closest label
before the failure point is considered to be the actual function,
but it obviously doesn't have the right unwind info.

Close #89585
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 23, 2023
@janvorli
Copy link
Member

Reopening until the port to 8 is done

@janvorli janvorli reopened this Aug 23, 2023
github-actions bot pushed a commit that referenced this issue Aug 24, 2023
Local labels in the JIT_ByRefWriteBarrier were not wrapped in the
LOCAL_LABEL macro. That causes incorrect unwinding in case an
access violation occurs inside of this function. The closest label
before the failure point is considered to be the actual function,
but it obviously doesn't have the right unwind info.

Close #89585
carlossanlop pushed a commit that referenced this issue Aug 24, 2023
Local labels in the JIT_ByRefWriteBarrier were not wrapped in the
LOCAL_LABEL macro. That causes incorrect unwinding in case an
access violation occurs inside of this function. The closest label
before the failure point is considered to be the actual function,
but it obviously doesn't have the right unwind info.

Close #89585

Co-authored-by: Jan Vorlicek <janvorli@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Sep 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants