-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Known samples for AndroidEnableMarshalMethods=true #8253
Comments
@grendello |
@tranb3r This is a very useful feature which gives a noticeable performance increase for huge majority of apps. The only known bad scenario we had was with Blazor apps, and those disable marshal methods by default. We enabled them in preview precisely for the reason of finding what else breaks and add it to the "disable" list if we can identify the issue. Note that marshal methods themselves are fine, the bug doesn't lie in their implementation as far as we can tell (we've had them around for nearly two years, they essentially implement in native code what is implemented in managed code when they are disabled) - the sole fact that they are that much faster triggers a race condition (at least in Blazor) of some sort. We haven't been able to determine what exactly trips up Blazor. If your sample has a pattern we can identify, we'd add it to the "disabled" list. For the gain in 90% (or more) of apps out there, it is worth enabling the feature by default and disable it explicitly (or automatically, if the pattern is identified as explained above) for the remaining 10% (or less) apps. |
@tranb3r as far as testing is concerned, we have hundreds of thousands of tests that run on CI and none of them fail with marshal methods. Without enabling the feature by default, we won't ever be able to gather information about what doesn't work "out there". |
@tranb3r have you seen the problem with anything other than |
Sure.
This is why I'm asking if the samples in this issue have been tested.
Yes. ANR when doing http requests. However I haven't been able to create a simple repro for it. |
This option should only be used in .net9 previews right? It is supposed to be somewhat broken on .net8? Because I have tried in the previous maui - sr release and also had some issues, and disabled it. |
@bcaceiro it was present in net8 but was off by default, correct. |
Superficial simplicity is unfortunately par for the course in a lot of .NET. One example of that is Blazor apps, another is async/await or LINQ. There is an awful lot of asynchronous or simply complicated code in those features that can fail under some circumstances without any obvious signals as to what's going on. Your sample uses an external service which definitely can introduce threading and/or race condition scenarios. That said, it's much, much simpler than the Blazor "hello world" app that we were investigating. I need to bring some tools up to date and hopefully your sample will give a clue as to what's failing and why.
The sad truth is that we don't have much time for manual testing. However, product previews are exactly where we can hope people like you, early testers, will discover issues and let us know about them. Alas, there are very few people following your footsteps and sometimes we learn about issues very late in the game. If we discover that marshal methods cause more trouble than we thought, we're going to disable them by default before the release and hopefully re-enable for .NET 10 - the feature is too valuable to just drop it (and we have more optimizations in mind that build on top of it).
That's another scenario involving threads and synchronization, open for race conditions. Hopefully all of them are caused by the same issue and maybe your sample app will help us pin point the problem and swat it for good. |
Here's a significantly reduced repro for the deadlock: https://github.com/filipnavara/mm-deadlock-repro. There's no MAUI or Blazor, just 50ish lines of .NET Android code in an empty template. Essentially it boils down to the .NET GC being dead-locked / live-locked in the stop-the-world stage. I have seen several patterns of this. Just run it in Release mode and it will immediately freeze before getting to display the "Hello" page. Once the app is in frozen state, the main thread is waiting in Android code in In an unreduced repro, when running under debugger, there was a native exception in UPD: Here’s a log with all the thread state transitions in MonoVM: https://gist.github.com/filipnavara/1f2e220063022cefe1375b7c87efa1ee. Seems like some code is initially run on one of the Java thread pool threads and that ends up with a wrong state. Next .NET GC from another Java thread pool thread fails to stop the world. UPD2: Fixed trace with names in MONO_STACKDATA: https://gist.github.com/filipnavara/fd997e3e9879c072935bde866ef51aa8 |
I have narrowed down the thread state transition issue to this:
At that point the thread pool thread is attached to the runtime. Yet the very next call to |
The sequence above is result of this: android/src/native/monodroid/xamarin-android-app-context.cc Lines 60 to 71 in 3fbc451
|
So, the simple fix is to replace Notably, cc @lambdageek |
Context: #8253 Context: #9343 We've had a number of reports about *hangs* when LLVM Marshal Methods are enabled (see also #8253), but we had been willing to accept having LLVM Marshal Methods enabled by default as it helps improve app startup time. However, [a customer reported a native crash][0] when LLVM Marshal Methods are enabled: > ![Thread Stack Trace][1] syscall at line 28 within libc __futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc sem_wait at line 108 within libc within libmonosgen-2.0.so (Build Id: …) within <anonymous:…> This native crash suggests "something going wrong" within MonoVM. We find this "scary" enough that we feel it is once again prudent to disable LLVM Marshal Methods by default in .NET 9, by setting the default value of `$(AndroidEnableMarshalMethods)`=False. Even though #9343 has a potential fix for the hang, we feel that we're too close to .NET 9 GA to validate in the wild. This feature will need to wait for .NET 10. 🙁 To explicitly enable LLVM Marshal Methods, set `$(AndroidEnableMarshalMethods)`=True in your App `.csproj`. [0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463 [1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12&
Context: dotnet#8253 Context: dotnet#9343 We've had a number of reports about *hangs* when LLVM Marshal Methods are enabled (see also dotnet#8253), but we had been willing to accept having LLVM Marshal Methods enabled by default as it helps improve app startup time. However, [a customer reported a native crash][0] when LLVM Marshal Methods are enabled: > ![Thread Stack Trace][1] syscall at line 28 within libc __futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc sem_wait at line 108 within libc within libmonosgen-2.0.so (Build Id: …) within <anonymous:…> This native crash suggests "something going wrong" within MonoVM. We find this "scary" enough that we feel it is once again prudent to disable LLVM Marshal Methods by default in .NET 9, by setting the default value of `$(AndroidEnableMarshalMethods)`=False. Even though dotnet#9343 has a potential fix for the hang, we feel that we're too close to .NET 9 GA to validate in the wild. This feature will need to wait for .NET 10. 🙁 To explicitly enable LLVM Marshal Methods, set `$(AndroidEnableMarshalMethods)`=True in your App `.csproj`. [0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463 [1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12&
…9346) Context: #8253 Context: #9343 We've had a number of reports about *hangs* when LLVM Marshal Methods are enabled (see also #8253), but we had been willing to accept having LLVM Marshal Methods enabled by default as it helps improve app startup time. However, [a customer reported a native crash][0] when LLVM Marshal Methods are enabled: > ![Thread Stack Trace][1] syscall at line 28 within libc __futex_wait_ex(void volatile*, bool, int, bool, timespec const*) at line 144 within libc sem_wait at line 108 within libc within libmonosgen-2.0.so (Build Id: …) within <anonymous:…> This native crash suggests "something going wrong" within MonoVM. We find this "scary" enough that we feel it is once again prudent to disable LLVM Marshal Methods by default in .NET 9, by setting the default value of `$(AndroidEnableMarshalMethods)`=False. Even though #9343 has a potential fix for the hang, we feel that we're too close to .NET 9 GA to validate in the wild. This feature will need to wait for .NET 10. 🙁 To explicitly enable LLVM Marshal Methods, set `$(AndroidEnableMarshalMethods)`=True in your App `.csproj`. [0]: https://discord.com/channels/732297728826277939/732297837953679412/1288758900627345463 [1]: https://cdn.discordapp.com/attachments/732297837953679412/1288758900522483712/05F6ED48-2BBA-4FC0-A54F-81BB57606500.png?ex=66f659c1&is=66f50841&hm=8f6f78f283f8c03e9fb37a452ad5d0babaaad3dd4bcc1b4887f4dcf60f651c12& Co-authored-by: Marek Habersack <grendel@twistedcode.net>
) Fixes: #8253 `get_function_pointer` can be called from any thread. In particular, it can be called from Android thread pool thread when method marshaling is used. We need to ensure those threads stay in the GC safe mode from the runtime perspective once we stop executing user code on those threads. `mono_thread_attach` switches the thread to GC unsafe mode and it stays there after we stop executing user code. The next GC will try to suspend the thread which cannot succeed since there's no cooperative runtime code running on that thread anymore. The fix is to use `mono_jit_thread_attach` which mimics what the iOS/macOS interop does. It creates the necessary MonoVM thread structures for JIT to work but doesn't transition the thread state.
) Fixes: #8253 `get_function_pointer` can be called from any thread. In particular, it can be called from Android thread pool thread when method marshaling is used. We need to ensure those threads stay in the GC safe mode from the runtime perspective once we stop executing user code on those threads. `mono_thread_attach` switches the thread to GC unsafe mode and it stays there after we stop executing user code. The next GC will try to suspend the thread which cannot succeed since there's no cooperative runtime code running on that thread anymore. The fix is to use `mono_jit_thread_attach` which mimics what the iOS/macOS interop does. It creates the necessary MonoVM thread structures for JIT to work but doesn't transition the thread state.
Context: #9343 Context: #8253 (comment) Test is based on the original repro from https://github.com/filipnavara/mm-deadlock-repro/
Context: #9343 Context: #8253 (comment) Test is based on the original repro from https://github.com/filipnavara/mm-deadlock-repro/
Android application type
.NET Android (net7.0-android, etc.)
Affected platform version
.NET 8 Preview 6
Description
We have turned off
AndroidEnableMarshalMethods
for .NET 8 due to too many issues surfaced during public testing. We plan on resolving those issues so it can be turned on for .NET 9.This is just an issue of known examples that fail at runtime when
AndroidEnableMarshalMethods=true
:Just linking them here for us to test later in future releases.
Note that even though we have closed the "duplicate" issues, they still represent actual known samples that should be considered for testing any marshal method fixes.
Steps to Reproduce
See links above.
Did you find any workaround?
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: