Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java.Runtime.Environment] Don't Destroy the JVM by default. #65

Merged
merged 1 commit into from
Aug 17, 2016

Conversation

jonpryor
Copy link
Member

When JniRuntime.CreationOptions.DestroyRuntimeOnDispose is true,
JavaVM::DestroyJavaVM() will be invoked when the JniRuntime
instance is disposed or finalized.

JreRuntime.CreateJreVM() would always set
DestroyRuntimeOnDispose to true, because it called
JNI_CreateJavaVM(), so of course you'd want to destroy the
Java VM, right?

Which brings us to unit tests. I don't know of any "before all test
fixtures run" and "after all test fixtures run" extension points,
which means:

  1. The JVM needs to be created implicitly, "on demand."
  2. There's no good way to destroy the JVM created in (1) after all
    tests have finished executing.

Which really means that the JreRuntime instance is finalized,
which sets us up for the unholy trifecta of AppDomain unloads,
finalizers, and JVM shutdown:

For unknown reasons, ~randomly, when running the unit tests (e.g.
make run-tests), the test runner will hang, indefinitely.

Attaching lldb and triggering a backtrace shows the unholy trifecta:

Finalization:

thread #4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403'
  ...
  frame #10: 0x00000001001ccb4a mono64`mono_gc_run_finalize(obj=<unavailable>, data=<unavailable>) + 938 at gc.c:256 [opt]
  frame #11: 0x00000001001cdd4a mono64`finalizer_thread [inlined] finalize_domain_objects + 51 at gc.c:681 [opt]
  frame #12: 0x00000001001cdd17 mono64`finalizer_thread(unused=<unavailable>) + 295 at gc.c:730 [opt]

JVM destruction:

thread #4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403'
  frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
  frame #1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767
  frame #2: 0x000000010ba5bc76 libjvm.dylib`os::PlatformEvent::park() + 192
  frame #3: 0x000000010ba38e32 libjvm.dylib`ParkCommon(ParkEvent*, long) + 42
  frame #4: 0x000000010ba39708 libjvm.dylib`Monitor::IWait(Thread*, long) + 168
  frame #5: 0x000000010ba398f0 libjvm.dylib`Monitor::wait(bool, long, bool) + 246
  frame #6: 0x000000010bb3dca2 libjvm.dylib`Threads::destroy_vm() + 80
  frame #7: 0x000000010b8fd665 libjvm.dylib`jni_DestroyJavaVM + 254

AppDomain unload:

thread #37: tid = 0x4038fb, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
  frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
  frame #1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767
  frame #2: 0x0000000100234a7f mono64`mono_os_cond_timedwait [inlined] mono_os_cond_wait(cond=0x0000000102016e50, mutex=0x0000000102016e10) + 11 at mono-os-mutex.h:105 [opt]
  frame #3: 0x0000000100234a74 mono64`mono_os_cond_timedwait(cond=0x0000000102016e50, mutex=0x0000000102016e10, timeout_ms=<unavailable>) + 164 at mono-os-mutex.h:120 [opt]
  frame #4: 0x0000000100234828 mono64`_wapi_handle_timedwait_signal_handle(handle=0x0000000000000440, timeout=4294967295, alertable=1, poll=<unavailable>, alerted=0x0000700000a286f4) + 536 at handles.c:1554 [opt]
  frame #5: 0x0000000100246370 mono64`wapi_WaitForSingleObjectEx(handle=<unavailable>, timeout=<unavailable>, alertable=<unavailable>) + 592 at wait.c:189 [opt]
  frame #6: 0x00000001001c832e mono64`mono_domain_try_unload [inlined] guarded_wait(timeout=4294967295, alertable=1) + 30 at appdomain.c:2390 [opt]
  frame #7: 0x00000001001c8310 mono64`mono_domain_try_unload(domain=0x000000010127ccb0, exc=0x0000700000a287a0) + 416 at appdomain.c:2482 [opt]
  frame #8: 0x00000001001c7db2 mono64`ves_icall_System_AppDomain_InternalUnload [inlined] mono_domain_unload(domain=<unavailable>) + 20 at appdomain.c:2379 [opt]
  frame #9: 0x00000001001c7d9e mono64`ves_icall_System_AppDomain_InternalUnload(domain_id=<unavailable>) + 46 at appdomain.c:2039 [opt]

This randomly results in deadlock, and hung Jenkins bots.

Fix this behavior by altering JreRuntime.CreateJreVM() to not
override the value of
JniRuntime.CreationOptions.DestroyRuntimeOnDispose. This allows the
constructor of the JreRuntime instance to decide whether or not the
JVM is destroyed.

In the case of TestJVM, we don't want to destroy the JVM.

This prevents the JVM from being destroyed, which in turn prevents the
hang during process shutdown.

When `JniRuntime.CreationOptions.DestroyRuntimeOnDispose` is true,
`JavaVM::DestroyJavaVM()` will be invoked when the `JniRuntime`
instance is disposed *or* finalized.

`JreRuntime.CreateJreVM()` would *always* set
`DestroyRuntimeOnDispose` to true, because it called
`JNI_CreateJavaVM()`, so *of course* you'd want to destroy the
Java VM, right?

Which brings us to unit tests. I don't know of any "before all test
fixtures run" and "after all test fixtures run" extension points,
which means:

1. The JVM needs to be created implicitly, "on demand."
2. There's no good way to destroy the JVM created in (1) after all
    tests have finished executing.

Which *really* means that the `JreRuntime` instance is *finalized*,
which sets us up for the unholy trifecta of AppDomain unloads,
finalizers, and JVM shutdown:

For unknown reasons, ~randomly, when running the unit tests (e.g.
`make run-tests`), the test runner will *hang*, indefinitely.

Attaching `lldb` and triggering a backtrace shows the unholy trifecta:

Finalization:

	thread dotnet#4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403'
	  ...
	  frame dotnet#10: 0x00000001001ccb4a mono64`mono_gc_run_finalize(obj=<unavailable>, data=<unavailable>) + 938 at gc.c:256 [opt]
	  frame dotnet#11: 0x00000001001cdd4a mono64`finalizer_thread [inlined] finalize_domain_objects + 51 at gc.c:681 [opt]
	  frame dotnet#12: 0x00000001001cdd17 mono64`finalizer_thread(unused=<unavailable>) + 295 at gc.c:730 [opt]

JVM destruction:

	thread dotnet#4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403'
	  frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
	  frame dotnet#1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767
	  frame dotnet#2: 0x000000010ba5bc76 libjvm.dylib`os::PlatformEvent::park() + 192
	  frame dotnet#3: 0x000000010ba38e32 libjvm.dylib`ParkCommon(ParkEvent*, long) + 42
	  frame dotnet#4: 0x000000010ba39708 libjvm.dylib`Monitor::IWait(Thread*, long) + 168
	  frame dotnet#5: 0x000000010ba398f0 libjvm.dylib`Monitor::wait(bool, long, bool) + 246
	  frame dotnet#6: 0x000000010bb3dca2 libjvm.dylib`Threads::destroy_vm() + 80
	  frame dotnet#7: 0x000000010b8fd665 libjvm.dylib`jni_DestroyJavaVM + 254

AppDomain unload:

	thread dotnet#37: tid = 0x4038fb, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
	  frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10
	  frame dotnet#1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767
	  frame dotnet#2: 0x0000000100234a7f mono64`mono_os_cond_timedwait [inlined] mono_os_cond_wait(cond=0x0000000102016e50, mutex=0x0000000102016e10) + 11 at mono-os-mutex.h:105 [opt]
	  frame dotnet#3: 0x0000000100234a74 mono64`mono_os_cond_timedwait(cond=0x0000000102016e50, mutex=0x0000000102016e10, timeout_ms=<unavailable>) + 164 at mono-os-mutex.h:120 [opt]
	  frame dotnet#4: 0x0000000100234828 mono64`_wapi_handle_timedwait_signal_handle(handle=0x0000000000000440, timeout=4294967295, alertable=1, poll=<unavailable>, alerted=0x0000700000a286f4) + 536 at handles.c:1554 [opt]
	  frame dotnet#5: 0x0000000100246370 mono64`wapi_WaitForSingleObjectEx(handle=<unavailable>, timeout=<unavailable>, alertable=<unavailable>) + 592 at wait.c:189 [opt]
	  frame dotnet#6: 0x00000001001c832e mono64`mono_domain_try_unload [inlined] guarded_wait(timeout=4294967295, alertable=1) + 30 at appdomain.c:2390 [opt]
	  frame dotnet#7: 0x00000001001c8310 mono64`mono_domain_try_unload(domain=0x000000010127ccb0, exc=0x0000700000a287a0) + 416 at appdomain.c:2482 [opt]
	  frame dotnet#8: 0x00000001001c7db2 mono64`ves_icall_System_AppDomain_InternalUnload [inlined] mono_domain_unload(domain=<unavailable>) + 20 at appdomain.c:2379 [opt]
	  frame dotnet#9: 0x00000001001c7d9e mono64`ves_icall_System_AppDomain_InternalUnload(domain_id=<unavailable>) + 46 at appdomain.c:2039 [opt]

This randomly results in deadlock, and hung Jenkins bots.

Fix this behavior by altering `JreRuntime.CreateJreVM()` to *not*
override the value of
`JniRuntime.CreationOptions.DestroyRuntimeOnDispose`. This allows the
constructor of the `JreRuntime` instance to decide whether or not the
JVM is destroyed.

In the case of TestJVM, we *don't* want to destroy the JVM.

This prevents the JVM from being destroyed, which in turn prevents the
hang during process shutdown.
@jonpryor jonpryor merged commit bee46ef into dotnet:master Aug 17, 2016
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants