Java side instance activation. #11

jonpryor · 2015-04-07T20:26:49Z

As started in commit 8c83f64.

When Java code does new JavaPeerType(), the constructor needs to create a corresponding managed peer instance.

The text was updated successfully, but these errors were encountered:

Context: https://bugzilla.xamarin.com/show_bug.cgi?id=37630 Context: #11 Context: xamarin/monodroid@940136eb Context: https://bugzilla.xamarin.com/show_bug.cgi?id=15542 Release builds with [Xamarin.Android + Java.Interop][0] are crashing on pre-Honeycomb devices (API-10 and earlier): UNHANDLED EXCEPTION: System.NotSupportedException: Unable to find the default constructor on type Android.Runtime.UncaughtExceptionHandler. Please provide the missing constructor. ---> Java.Interop.JavaLocationException: Exception of type 'Java.Interop.JavaLocationException' was thrown. Java.Lang.Error: Exception of type 'Java.Lang.Error' was thrown. --- End of managed exception stack trace --- java.lang.Error: Java callstack: at mono.android.TypeManager.n_activate(Native Method) at mono.android.TypeManager.Activate(TypeManager.java:7) at android.runtime.UncaughtExceptionHandler.<init>(UncaughtExceptionHandler.java:24) at mono.android.Runtime.init(Native Method) at mono.MonoPackageManager.LoadApplication(MonoPackageManager.java:40) at mono.MonoRuntimeProvider.attachInfo(MonoRuntimeProvider.java:22) at android.app.ActivityThread.installProvider(ActivityThread.java:4122) at android.app.ActivityThread.installContentProviders(ActivityThread.java:3832) at android.app.ActivityThread.handleBindApplication(ActivityThread.java:3788) at android.app.ActivityThread.access$2200(ActivityThread.java:132) at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1082) at android.os.Handler.dispatchMessage(Handler.java:99) at android.os.Looper.loop(Looper.java:150) at android.app.ActivityThread.main(ActivityThread.java:4263) at java.lang.reflect.Method.invokeNative(Native Method) at java.lang.reflect.Method.invoke(Method.java:507) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:839) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:597) at dalvik.system.NativeStart.main(Native Method) --- End of inner exception stack trace --- at Java.Interop.TypeManager.n_Activate (IntPtr jnienv, IntPtr jclass, IntPtr typename_ptr, IntPtr signature_ptr, IntPtr jobject, IntPtr parameters_ptr) <0x45b540c0 + 0x00570> in <filename unknown>:0 at (wrapper dynamic-method) System.Object:010d7c44-0c93-4b7d-b78f-129ff9bedae6 (intptr,intptr,intptr,intptr,intptr,intptr) UNHANDLED EXCEPTION: System.NotSupportedException: Don't know how to convert type 'System.String' to an Android.Runtime.IJavaObject. at Android.Runtime.JNIEnv.AssertIsJavaObject (System.Type targetType) <0x45b56130 + 0x000b4> in <filename unknown>:0 at Android.Runtime.JNIEnv.<CreateNativeArrayElementToManaged>m__B (System.Type type, IntPtr source, Int32 index) <0x45b5db30 + 0x0001b> in <filename unknown>:0 at Android.Runtime.JNIEnv.GetObjectArray (IntPtr array_ptr, System.Type[] element_types) <0x45b54e60 + 0x0010f> in <filename unknown>:0 at Java.Interop.TypeManager.n_Activate (IntPtr jnienv, IntPtr jclass, IntPtr typename_ptr, IntPtr signature_ptr, IntPtr jobject, IntPtr parameters_ptr) <0x45b540c0 + 0x00283> in <filename unknown>:0 at (wrapper dynamic-method) System.Object:010d7c44-0c93-4b7d-b78f-129ff9bedae6 (intptr,intptr,intptr,intptr,intptr,intptr) ... The root cause is that [Android sucks prior to API-11][1]: You can't use JNIEnv::CallVoidMethod() or JNIEnv::CallNonvirtualVoidMethod() to invoke constructors because Dalkvik raises a CloneNotSupportedException, which in turn means there's no actual point to using JNIEnv::AllocObject(), which means we need to use JNIEnv::NewObject(). JNIEnv::NewObject() sucks because it means we can enter managed code [before we've registered an instance mapping][2], which is painful. Specifically, in order to differentiate between the "Java code created this instance" vs. "managed code created this instance", Xamarin.Android's JNIEnv.NewObject() method sets an internal flag -- TypeManager.ActivationEnabled -- so that when the "activation" code path is hit it will *bail* if managed code created the instance. This prevents the constructor from nuking the stack -- C# invokes Java invokes C# (via activation) invokes Java invokes... The problem? Java.Interop has no analog to this infrastructure, and thus no way to check if we're within a "nested" JNIEnv::NewObject() invocation rooted in managed code. Consequently, on an API-10 device we'd try to activate the MainActivity instance...and possibly nuke the stack. That's the cause of this message: Unable to find the default constructor on type Android.Runtime.UncaughtExceptionHandler. We're trying to create the UncaughtExceptionHandler instance from managed code, which hits JNIEnv::NewObject(), which invokes the Java constructor, which hits the activation code path, which would then try to invoke the default constructor, which -- if it existed -- would go **BOOM**. There are two plausible fixes: 1. Drop support for API-10 and earlier devices. 2. Mirror the Xamarin.Android JNIEnv::NewObject() "hacks". (1) isn't really in the cards: even when we "dropped" bindings for API-4, we still continued to support *executing* on them while using the API-10 bindings. Which leaves (2). Add a new public read-only property, JniEnvironment.WithinNewObjectScope. This property is true while JniEnvironment.Object.NewObject() is executing -- in precisely the same way that TypeManager.ActivationEnabled is false while JNIEnv.NewObject() is executing, which can (will) include nested cross-VM invocations. (Fun!) With this support in place, Xamarin.Android can use JniEnvironment.WithinNewObjectScope instead of TypeManager.ActivationEnabled, fixing the crash. [0]: xamarin/monodroid#317 [1]: https://code.google.com/p/android/issues/detail?id=13832 [2]: http://developer.xamarin.com/guides/android/under_the_hood/architecture/#Java_Activation

…State We want to retrofit Xamarin.Android to use Java.Interop. This in turn requires that Java.Interop be able to do everything that Xamarin.Android does, with the added complication that Java.Interop will be used in Xamarin.Android before major parts are complete (e.g. Activation [8c83f64] and Issue #11 [0]). Fortunately we can look at how Xamarin.Android is architected to see what we might need to provide in advance before it's actually used, which leads us to Xamarin.Android's internal Java.Interop.IJavaObjectEx interface: interface IJavaObjectEx { IntPtr KeyHandle {get; set;} bool IsProxy {get; set;} bool NeedsActivation {get; set;} IntPtr ToLocalJniHandle (); } Of those members, IJavaObjectEx.KeyHandle is already exposed as IJavaPeerable.JniIdentityHashCode, and IJavaObjectEx.ToLocalJniHandle() shouldn't be needed (it was added to incorrectly address a multithreading-related bug). That leaves IJavaObjectEx.IsProxy and IJavaObjectEx.NeedsActivation, which are both involved in Java-side activation. In the interest of avoiding API breaks in the future, we need to support those constructs in Java.Interop *now*, even if they won't be fully utilized until later. Additionally, those names suck for "public" names -- what do they *mean*, sans context? -- and, while things have been reasonably stable here for the past several years, I'm not entirely certain that more such states won't need to be added in the future, so we need to support IJavaObjectEx.IsProxy and IJavaObjectEx.NeedsActivation semantics in an "extensible" manner? The solution? Yet another [Flags] enum! [Flags] public enum JniManagedPeerStates { None, Activatable = (1 << 0), Replaceable = (1 << 1), } The use of a [Flags] enum allows us to add additional states in the future, should we need to do so. JniManagedPeerStates.Activatable is IJavaObjectEx.NeedsActivation, and means that IJavaPeerable.PeerReference was set *before* the constructor was invoked. (Setting IJavaPeerable.PeerReference before the constructor executes is not yet done in Java.Interop.) It means that a future "proper" constructor invocation is assumed to be forthcoming, as during Java activation, IF the Java constructor virtually invokes a method which is overridden in managed code, a managed peer will need to be constructed so that the method override can be invoked, and "later" the "real" constructor will be invoked. https://developer.xamarin.com/guides/android/under_the_hood/architecture/#Java_Activation 1. `new Peer(...)` is invoked from Java. 2. A super class constructor of NativePeer virtually invokes a method overridden by Peer in managed code. 3. The Marshal Method is executed, which needs to delegate the method to *something*, and thus creates a new managed Peer instance through the activation constructor. This created managed peer instance will have the JniManagedPeerStates.Activatable state set. 4. The managed override is executed and returns back to Java. 5. Once all super constructors have finished, the Peer constructor executes com.xamarin.java_interop.ManagedPeer.runConstructor(), 6. ManagedPeer.runConstructor() invokes the appropriate corresponding constructor on the instance created in (3). If the JniManagedPeerStates.Activatable state *isn't* set, then the ManagedPeer.runConstructor() call would be *ignored*. This also means that *two constructors* will be invoked on *one instrance*. The JniManagedPeerStates.Activatable state needs to be set to sanely prevent invoking constructors more than is intended on a given peer instance. JniManagedPeerStates.Replaceable is IJavaObjectEx.IsProxy, and means that the Peer instance was created through the activation constructor. It additionally means that if two managed instances are created around the same Java instance, the non-Replaceable instance will be the one returned by JniRuntime.JniValueManager.PeekObject(). Normally, JniManagedPeerStates.Replaceable shouldn't be needed, but there is one environment where it is: Android devices which are pre-Honeycomb (API-11). On those devices, JNIEnv.AllocObject() cannot be used (8c4248b), so something very similar but not quite like the above Activation case can happen when constructing instances *from managed code*. In "normal" use -- JNIEnv::AllocObject() works! -- managed construction is: 1. `new Peer()` invokes managed constructor. 2. Managed constructor calls JniPeerMembers.InstanceMethods.StartCreateInstance(), which uses JNIEnv::AllocObject() to create a Java instance *without executing the Java instance constructor*. 3. JniRuntime.JniValueManager.Construct() adds the Java instance from (2) to the mapping table. 4. The constructor calls JniPeerMembers.InstanceMethods.FinishCreateInstance(), which invokes the Java instance constructor, and if a Java instance constructor virtually invokes a method overridden in managed code, the marshal method will find the instance created in (1). When JNIEnv::AllocObject() doesn't work, the above falls down: 1. `new Peer()` invokes managed constructor. 2. Managed constructor calls JniPeerMembers.InstanceMethods.StartCreateInstance(), which uses JNIEnv::NewObject() to create a Java instance *and executes the Java instance constructor*. 3. If a Java constructor virtually invokes a method overridden in managed code, the marshal method will be invoked and won't find an already-existing peer for the Java instance, and thus will create one via JniRuntime.JniValueManager.CreatePeer(). 4. JniRuntime.JniValueManager.CreatePeer() will set the JniManagedPeerStates.Replaceable state on the instance created in (3). Note: at this point there are *two* managed peers which want to "own" the Java instance: the instance created in (1) which is still being constructed (!), and the instance created in (3). 5. The overriding managed method returns control to the Java constructor, which eventually completes execution and returns execution to the Peer managed constructor in (1). 6. The Peer constructor invokes JniRuntime.JniValueManager.Construct() to add the instance from (1) to the instance mapping table, a mapping which *already exists* because of (3). There needs to be a way for (6) to replace the mapping created in (3) with the Peer instance in (1), and JniManagedPeerStates.Replaceable is how that is tracked. (Aside: supporting platforms that have broken JNIEnv::AllocObject() implementations is a GIANT PAIN IN THE ASS.) To use JniManagedPeerStates, update IJavaPeerable to add the following members: partial interface IJavaPeerable { JniManagedPeerStates JniManagedPeerState {get;} void SetJniManagedPeerState (JniManagedPeerStates value); } IJavaPeerable.JniManagedPeerState is the current state of the managed peer. IJavaPeerable.SetJniManagedPeerState() allows updating the current state of the managed peer, permitting the state to be tracked cross activation constructor calls. [0]: #11

When `JniRuntime.CreationOptions.DestroyRuntimeOnDispose` is true, `JavaVM::DestroyJavaVM()` will be invoked when the `JniRuntime` instance is disposed *or* finalized. `JreRuntime.CreateJreVM()` would *always* set `DestroyRuntimeOnDispose` to true, because it called `JNI_CreateJavaVM()`, so *of course* you'd want to destroy the Java VM, right? Which brings us to unit tests. I don't know of any "before all test fixtures run" and "after all test fixtures run" extension points, which means: 1. The JVM needs to be created implicitly, "on demand." 2. There's no good way to destroy the JVM created in (1) after all tests have finished executing. Which *really* means that the `JreRuntime` instance is *finalized*, which sets us up for the unholy trifecta of AppDomain unloads, finalizers, and JVM shutdown: For unknown reasons, ~randomly, when running the unit tests (e.g. `make run-tests`), the test runner will *hang*, indefinitely. Attaching `lldb` and triggering a backtrace shows the unholy trifecta: Finalization: thread dotnet#4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403' ... frame dotnet#10: 0x00000001001ccb4a mono64`mono_gc_run_finalize(obj=<unavailable>, data=<unavailable>) + 938 at gc.c:256 [opt] frame dotnet#11: 0x00000001001cdd4a mono64`finalizer_thread [inlined] finalize_domain_objects + 51 at gc.c:681 [opt] frame dotnet#12: 0x00000001001cdd17 mono64`finalizer_thread(unused=<unavailable>) + 295 at gc.c:730 [opt] JVM destruction: thread dotnet#4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403' frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame dotnet#1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767 frame dotnet#2: 0x000000010ba5bc76 libjvm.dylib`os::PlatformEvent::park() + 192 frame dotnet#3: 0x000000010ba38e32 libjvm.dylib`ParkCommon(ParkEvent*, long) + 42 frame dotnet#4: 0x000000010ba39708 libjvm.dylib`Monitor::IWait(Thread*, long) + 168 frame dotnet#5: 0x000000010ba398f0 libjvm.dylib`Monitor::wait(bool, long, bool) + 246 frame dotnet#6: 0x000000010bb3dca2 libjvm.dylib`Threads::destroy_vm() + 80 frame dotnet#7: 0x000000010b8fd665 libjvm.dylib`jni_DestroyJavaVM + 254 AppDomain unload: thread dotnet#37: tid = 0x4038fb, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame dotnet#1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767 frame dotnet#2: 0x0000000100234a7f mono64`mono_os_cond_timedwait [inlined] mono_os_cond_wait(cond=0x0000000102016e50, mutex=0x0000000102016e10) + 11 at mono-os-mutex.h:105 [opt] frame dotnet#3: 0x0000000100234a74 mono64`mono_os_cond_timedwait(cond=0x0000000102016e50, mutex=0x0000000102016e10, timeout_ms=<unavailable>) + 164 at mono-os-mutex.h:120 [opt] frame dotnet#4: 0x0000000100234828 mono64`_wapi_handle_timedwait_signal_handle(handle=0x0000000000000440, timeout=4294967295, alertable=1, poll=<unavailable>, alerted=0x0000700000a286f4) + 536 at handles.c:1554 [opt] frame dotnet#5: 0x0000000100246370 mono64`wapi_WaitForSingleObjectEx(handle=<unavailable>, timeout=<unavailable>, alertable=<unavailable>) + 592 at wait.c:189 [opt] frame dotnet#6: 0x00000001001c832e mono64`mono_domain_try_unload [inlined] guarded_wait(timeout=4294967295, alertable=1) + 30 at appdomain.c:2390 [opt] frame dotnet#7: 0x00000001001c8310 mono64`mono_domain_try_unload(domain=0x000000010127ccb0, exc=0x0000700000a287a0) + 416 at appdomain.c:2482 [opt] frame dotnet#8: 0x00000001001c7db2 mono64`ves_icall_System_AppDomain_InternalUnload [inlined] mono_domain_unload(domain=<unavailable>) + 20 at appdomain.c:2379 [opt] frame dotnet#9: 0x00000001001c7d9e mono64`ves_icall_System_AppDomain_InternalUnload(domain_id=<unavailable>) + 46 at appdomain.c:2039 [opt] This randomly results in deadlock, and hung Jenkins bots. Fix this behavior by altering `JreRuntime.CreateJreVM()` to *not* override the value of `JniRuntime.CreationOptions.DestroyRuntimeOnDispose`. This allows the constructor of the `JreRuntime` instance to decide whether or not the JVM is destroyed. In the case of TestJVM, we *don't* want to destroy the JVM. This prevents the JVM from being destroyed, which in turn prevents the hang during process shutdown.

When `JniRuntime.CreationOptions.DestroyRuntimeOnDispose` is true, `JavaVM::DestroyJavaVM()` will be invoked when the `JniRuntime` instance is disposed *or* finalized. `JreRuntime.CreateJreVM()` would *always* set `DestroyRuntimeOnDispose` to true, because it called `JNI_CreateJavaVM()`, so *of course* you'd want to destroy the Java VM, right? Which brings us to unit tests. I don't know of any "before all test fixtures run" and "after all test fixtures run" extension points, which means: 1. The JVM needs to be created implicitly, "on demand." 2. There's no good way to destroy the JVM created in (1) after all tests have finished executing. Which *really* means that the `JreRuntime` instance is *finalized*, which sets us up for the unholy trifecta of AppDomain unloads, finalizers, and JVM shutdown: For unknown reasons, ~randomly, when running the unit tests (e.g. `make run-tests`), the test runner will *hang*, indefinitely. Attaching `lldb` and triggering a backtrace shows the unholy trifecta: Finalization: thread #4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403' ... frame #10: 0x00000001001ccb4a mono64`mono_gc_run_finalize(obj=<unavailable>, data=<unavailable>) + 938 at gc.c:256 [opt] frame #11: 0x00000001001cdd4a mono64`finalizer_thread [inlined] finalize_domain_objects + 51 at gc.c:681 [opt] frame #12: 0x00000001001cdd17 mono64`finalizer_thread(unused=<unavailable>) + 295 at gc.c:730 [opt] JVM destruction: thread #4: tid = 0x403831, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'tid_1403' frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame #1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767 frame #2: 0x000000010ba5bc76 libjvm.dylib`os::PlatformEvent::park() + 192 frame #3: 0x000000010ba38e32 libjvm.dylib`ParkCommon(ParkEvent*, long) + 42 frame #4: 0x000000010ba39708 libjvm.dylib`Monitor::IWait(Thread*, long) + 168 frame #5: 0x000000010ba398f0 libjvm.dylib`Monitor::wait(bool, long, bool) + 246 frame #6: 0x000000010bb3dca2 libjvm.dylib`Threads::destroy_vm() + 80 frame #7: 0x000000010b8fd665 libjvm.dylib`jni_DestroyJavaVM + 254 AppDomain unload: thread #37: tid = 0x4038fb, 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame #0: 0x00007fff9656bdb6 libsystem_kernel.dylib`__psynch_cvwait + 10 frame #1: 0x00007fffa04d4728 libsystem_pthread.dylib`_pthread_cond_wait + 767 frame #2: 0x0000000100234a7f mono64`mono_os_cond_timedwait [inlined] mono_os_cond_wait(cond=0x0000000102016e50, mutex=0x0000000102016e10) + 11 at mono-os-mutex.h:105 [opt] frame #3: 0x0000000100234a74 mono64`mono_os_cond_timedwait(cond=0x0000000102016e50, mutex=0x0000000102016e10, timeout_ms=<unavailable>) + 164 at mono-os-mutex.h:120 [opt] frame #4: 0x0000000100234828 mono64`_wapi_handle_timedwait_signal_handle(handle=0x0000000000000440, timeout=4294967295, alertable=1, poll=<unavailable>, alerted=0x0000700000a286f4) + 536 at handles.c:1554 [opt] frame #5: 0x0000000100246370 mono64`wapi_WaitForSingleObjectEx(handle=<unavailable>, timeout=<unavailable>, alertable=<unavailable>) + 592 at wait.c:189 [opt] frame #6: 0x00000001001c832e mono64`mono_domain_try_unload [inlined] guarded_wait(timeout=4294967295, alertable=1) + 30 at appdomain.c:2390 [opt] frame #7: 0x00000001001c8310 mono64`mono_domain_try_unload(domain=0x000000010127ccb0, exc=0x0000700000a287a0) + 416 at appdomain.c:2482 [opt] frame #8: 0x00000001001c7db2 mono64`ves_icall_System_AppDomain_InternalUnload [inlined] mono_domain_unload(domain=<unavailable>) + 20 at appdomain.c:2379 [opt] frame #9: 0x00000001001c7d9e mono64`ves_icall_System_AppDomain_InternalUnload(domain_id=<unavailable>) + 46 at appdomain.c:2039 [opt] This randomly results in deadlock, and hung Jenkins bots. Fix this behavior by altering `JreRuntime.CreateJreVM()` to *not* override the value of `JniRuntime.CreationOptions.DestroyRuntimeOnDispose`. This allows the constructor of the `JreRuntime` instance to decide whether or not the JVM is destroyed. In the case of TestJVM, we *don't* want to destroy the JVM. This prevents the JVM from being destroyed, which in turn prevents the hang during process shutdown.

jonpryor closed this as completed in 1c99956 Feb 23, 2016

github-actions bot locked and limited conversation to collaborators Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Java side instance activation. #11

Java side instance activation. #11

jonpryor commented Apr 7, 2015

Java side instance activation. #11

Java side instance activation. #11

Comments

jonpryor commented Apr 7, 2015