ndk-glue: Move native activity behind lock and remove it in `onDestroy` #160

MarijnS95 · 2021-07-24T22:00:14Z

When an application returns its control flow back from main() it is generally assumed to be done polling on the looper and wishes to terminate. Android is made aware of this by calling ANativeActivity_finish, otherwise it'll continue sending input events
and eventually ANR as the app didn't consume them anymore.

NativeActivity is now behind a lock, and will be removed as soon as onDestroy is called. Due to the async nature of events, make sure to hold on to the lock received from native_activity() beforehand if you wish to use it during handling of Event::Destroy.

TODO: I assume https://developer.android.com/ndk/reference/group/native-activity#anativeactivity_finish says that ANativeActivity is invalid after this call? If so we should make fn finish(&self) take ANativeActivty by value instead (to force use of .take()) - we could even have a drop handler? Done!

dvc94ch

thanks!

dvc94ch · 2021-07-24T22:07:00Z

TODO: I assume https://developer.android.com/ndk/reference/group/native-activity#anativeactivity_finish says that ANativeActivity is invalid after this call? If so we should make fn finish(&self) take ANativeActivty by value instead (to force use of .take())

That makes sense

we could even have a drop handler?

probably not worth overdoing this

MarijnS95 · 2021-07-24T22:17:00Z

we could even have a drop handler?

probably not worth overdoing this

It would be clean. Not everyone might use ndk_glue (though it is unlikely) and it would lead to a dangling pointer (though that's the least of our concerns) and ANR with no way to recover if it's lost.

dvc94ch · 2021-07-24T22:18:43Z

Sure if you want to do it

MarijnS95 · 2021-07-24T22:22:34Z

@dvc94ch Done :)

dvc94ch · 2021-07-24T22:30:46Z

So is the drop handler guaranteed to be executed before the mutex is released?

MarijnS95 · 2021-07-24T22:32:24Z

@dvc94ch NATIVE_ACTIVITY is not protected by any mutex, otherwise we'd have to lock this for writing.

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L56

dvc94ch · 2021-07-24T22:33:20Z

wait didn't this used to be a RwLock?

dvc94ch · 2021-07-24T22:34:06Z

I guess not, so ignore my comments :)

MarijnS95 · 2021-07-24T22:34:17Z

Only the other bits:

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L49-L54

dvc94ch · 2021-07-24T22:35:08Z

but it could panic right?

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L59

MarijnS95 · 2021-07-24T22:35:47Z

It's probably not behind a lock because it's always "statically" set during activity execution and doesn't change (well, except when someone returns from fn main() while still having a thread alive that might access it), but all the other variables can reasonably change whenever Android feels like calling one of the callbacks.

dvc94ch · 2021-07-24T22:36:08Z

I think the static mut is only safe because it is initialized at startup and never written to

MarijnS95 · 2021-07-24T22:36:30Z

but it could panic right?

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L59

Yes, but only if the user calls native_activity() after returning from main(), which we can simply deem invalid and incorrect.

MarijnS95 · 2021-07-24T22:37:15Z

I think the static mut is only safe because it is initialized at startup and never written to

Exactly.

I think we might change unwrap() to .expect("Native activity has already been finished") to make this more clear.

dvc94ch · 2021-07-24T22:38:21Z

well that's the point, it's two separate instructions, so if there are multiple threads there could be a race. But I'm probably overanalyzing this.

dvc94ch · 2021-07-24T22:40:31Z

So what I'm saying is the program does this:

make static mut NATIVE_ACTIVITY None
anything could happen including calling native_activity
the drop handler is run

while 1 and 2 could actually happen at the same time meaning it's UB.

MarijnS95 · 2021-07-24T22:41:28Z

Reading or writing the pointers is only a single instruction. The point is to prevent other threads doing so while the thread holding the lock is doing different things with it. No such case with ANativeActivity as far as I'm aware.

dvc94ch · 2021-07-24T22:43:08Z

native_activity is public so it could happen. and I think winit does actually call native_activity and it runs in a separate thread.

MarijnS95 · 2021-07-24T22:44:03Z

So what I'm saying is the program does this:

make static mut NATIVE_ACTIVITY None

anything could happen including calling native_activity

the drop handler is run

while 1 and 2 could actually happen at the same time meaning it's UB.

The only one making NATIVE_ACTIVITY = None is the .take() here. Theoretically nothing else should run that calls .native_activity(). We don't do that in any of the ANativeActivityCallbacks (that could perhaps run after _finish() is called, I don't know), and if the user does that, they shouldn't have a looper listening to events in the first place as they have already returned from fn main().

MarijnS95 · 2021-07-24T22:46:09Z

native_activity is public so it could happen. and I think winit does actually call native_activity and it runs in a separate thread.

Yeah I think that's the point we're trying to make - or the convention we're trying to set - here. The crate shouldn't have any threads live after returning from fn main(), as that makes us unable to call .finish(). Since non-daemon threads are a thing we might instead require the end user to manually drop NativeActivity or run .finish()? Then we have the choice to take self by value, or keep the drop handler and recommend users to call drop(native_activity) instead. They should be able to .take() it from the Option either way (as I don't think having it around is a good idea after _finish() was called), and we should document this requirement re. #154.

MarijnS95 · 2021-07-24T22:51:03Z

We can possibly drop/.finish() in an atexit handler as well?

EDIT: I doubt the atexit handler runs unless all threads are gone (and the entire process terminates), which is likely not the case here otherwise the app would have vanished instead of becoming unresponsive. We might have something sticking around somewhere.

dvc94ch · 2021-07-24T22:55:10Z

I think taking self by reference would avoid writing to a static mut which is usually not a good idea. Presumably after calling finish the jvm stops the process and kills all threads? The alternative would be to just put the native activity in a RwLock and just hold the lock while calling finish. Or we can leave it as is and see if there are any panics/segfaults in the wild, which probably is pretty rare in practice.

dvc94ch · 2021-07-24T23:05:16Z

It's funny how the smaller the PR the larger the discussion hahha

MarijnS95 · 2021-07-25T10:20:26Z

It's funny how the smaller the PR the larger the discussion hahha

Yup the bigger churn is never an issue, but smaller controversial lines like these need to be thoroughly thought out and discussed. It's good that it's a single, isolated change in a PR anyway, and not hidden in some mega-PR :)

I think taking self by reference would avoid writing to a static mut which is usually not a good idea.

If we decide to expose drop/.finish() to the user we should definitely use by-value (move) semantics and turn the static into a lock. Allowing the user to use the NativeActivity pointer after calling ANativeActivity_finish() is probably just as bad if not worse than the race conditions that might follow.

Presumably after calling finish the jvm stops the process and kills all threads?

That's possible. Just like normal Android Activity instances the app is kept alive until explicitly calling .finish() even if no user (non-daemon) threads are spawned. That's the nature of being asynchronous and using callbacks.

In this instance automatically calling .finish() after returning from fn main() isn't the best idea, see first bulletpoint below.

The alternative would be to just put the native activity in a RwLock and just hold the lock while calling finish. Or we can leave it as is and see if there are any panics/segfaults in the wild, which probably is pretty rare in practice.

Pretty sure we'd see segfaults or other UB if the user were to continue using NativeActivity and friends after calling ANativeActivity_finish - even if Android kills all threads there's still a possibility for race conditions before that happens.

After a ~~good~~ night sleep, these are the main takeaways:

ndk_glue::main shouldn't really be called main, it's only part of the ANativeActivity_onCreate "callback". The special "finish-on-return" semantics I'm introducing here don't make much sense in that regard. Can we rename it to ndk_glue::onCreate so that it is clear that returning from this function doesn't have any shutdown guarantees?
- Existing Rust apps might already rely on this behaviour (spawn their own threads and return).
Exiting the app #154 Is not really an issue at all, we should just document more properly that .finish() has to be called whenever the users wishes to terminate.
NativeActivity should either have a drop handler, or take self by value in fn finish(self) to prevent leaking dead state/pointer back to the user.

dvc94ch · 2021-07-25T11:10:14Z

ndk_glue::main shouldn't really be called main, it's only part of the ANativeActivity_onCreate "callback". The special "finish-on-return" semantics I'm introducing here don't make much sense in that regard. Can we rename it to ndk_glue::onCreate so that it is clear that returning from this function doesn't have any shutdown guarantees?

That sounds good yes.

NativeActivity should either have a drop handler, or take self by value in fn finish(self) to prevent leaking dead state/pointer back to the user.

I guess it won't be possible to use it with the glue, as we only expose a fn native_activity() at the moment. we can wrap it in a RwLock and expose a fn finish_activity(), but that would be a breaking change and would probably require a PR to winit too.

MarijnS95 · 2021-07-25T12:22:46Z

In hindsight I'm not too certain about the proposed changes above. The looper is explicitly associated with the current thread, and returning from fn main() means the looper must have been given up. It is apparently possible to pass a looper to a different thread using ALooper_acquire() and it's probably fine then to call the various poll*() functions from a different thread too.

I guess it won't be possible to use it with the glue, as we only expose a fn native_activity() at the moment. we can wrap it in a RwLock and expose a fn finish_activity(), but that would be a breaking change and would probably require a PR to winit too.

Yes a separate read and write lock function have to be exposed, or what I was experimenting with today: fn finish_native_activity() that takes and finishes the object, without ever exposing a public function with &mut access to the Option<NativeActivity>.

Perhaps it is a good idea to collect some more opinions on this complicated and controversial change. Maybe @msiglreith and @francesca64 have some suggestions?

MarijnS95 · 2021-07-27T10:30:11Z

@msiglreith Checking out winit it never calls .finish() for the user anywhere, probably leading to a stuck application. Crates using winit are responsible for using the ndk_glue::main macro directly themselves, making them responsible for knowing about returning from main() finishing the activity if we were to push this PR through in its current form.

What do you think about the takeaway steps listed in #160 (comment)? In short:

Rename ndk_glue::main to ndk_glue::onCreate;
Use locks around NativeActivity and pass it by-value in .finish() or use a drop handler;
Document the need to call .finish() on application exit, most likely through an fn native_activity_finish() so that we don't have to move the NativeActivity to the user anywhere;
Use finish()/drop() in winit once this lands, whenever the user sets ControlFlow::Exit and forward Event::Destroyed per your description.
Note that the documentation for onDestroy says:

NativeActivity is being destroyed.

At that point it's probably still a live, valid pointer (passed as sole argument to the callback) but fn native_activity() would already be None if we were to implement proper move semantics, so we have to carefully consider what happens. If ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics (but &mut so that the user can't call it when retrieving an immutable handle through the glue, only when building an app without glue).
However, we're processing all these callbacks asynchronously when we should most likely wait inside onDestroy until the application (thread polling the event pipe) says: "okay, I'm done with the native activity now, go ahead!", just like the lock scenario in ndk-glue does not handle NativeWindow lifecycle correctly #117 / Notify event pipe before releasing NativeActivity resources #134.
That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

msiglreith · 2021-07-29T17:49:42Z

@MarijnS95 I'm fine with these. Regarding finish vs drop I would prefer calling finish as it might be easier to discover/read than having the explicitly drop a static.

ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics

IIRC it just post some event to another thread/process and returns.

That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

yeah. this seems to be the best way to move forward (even though a bit hacky).

MarijnS95 · 2021-08-07T15:16:43Z

@msiglreith Thanks for your view, and apologies for letting this sit unattended for a while.

@MarijnS95 I'm fine with these. Regarding finish vs drop I would prefer calling finish as it might be easier to discover/read than having the explicitly drop a static.

Assuming we never want to give the user ownership of the NativeActivity instance while still using the appropriate move semantics this is going to be a global fn finish_native_activity() anyway (the user won't be able to call fn finish(self) if it can't pull it out of the lock in the first place). Internally we can do it either way, but since finish is "magical" and not just cleaning up some object (it shuts down the application) it's probably better to keep it an explicit .take().finish(); indeed.

EDIT: Realizing now that this doesn't make much sense, given what I wrote below. .finish() should be safe to call on an (immutable) reference to NativeActivity without assuming ownership, since the onDestroy callback will only be called after that. that's - just like onNativeWindowDestroyed - probably the only sensible place to set our global handle to None.

ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics

IIRC it just post some event to another thread/process and returns.

So the pointer to ANativeActivity stays alive after this, and the user is still allowed to call functions on it? Seeing as onDestroy receives this pointer it should still be valid, but Android might not allow to call certain functions anymore.

That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

yeah. this seems to be the best way to move forward (even though a bit hacky).

It's already a pattern for NATIVE_WINDOW so it shouldn't hurt too much. If a developer relies on this they'll quickly run into a panic if the thing is already None. Either way I'll add this to the enum documentation on Event :)

MarijnS95 · 2021-08-07T17:56:04Z

I don't think this PR is actually fixing "issue" #154 anymore. We should probably update the documentation and examples to call .finish() in order to close that. But at least we're removing the activity after onDestroy is called - which isn't what this PR initially set out to do at all 😬

dvc94ch · 2022-04-19T15:54:50Z

to be honest don't quite get the purpose of this change. also needs to be rebased.

MarijnS95 · 2022-04-19T15:56:32Z

@dvc94ch No surprise, we've been going back and forth on possible issues and implementations of this. IIRC the gist is that we can call .finish() for the user so that their NativeActivity doesn't remain dormant after exiting. But I'll have to re-read it to see if this is still relevant at all or if the final resolution is "don't care, just document that the user should call .finish() when they're done".

osdalong · 2022-10-11T06:11:17Z

hi sirs
I found the app is still alive although it calls the finish(). my version is 0.10.0, the main() code is below

pub fn main() {
println!("main...");
println!("{:?}", *APP_CONFIG);
let mut app = AppData {
destroy_requested: false,
resumed: false,
};
run(&mut app).unwrap();
println!("main.. successfully shutdown.");
// the ndk_glue api does not automatically call this and without
// it main will hang on exit, currently there seems to be no plans to
// make it automatic, refer to:
// #154
ndk_glue::native_activity().finish();
}

osdalong · 2022-10-11T07:05:52Z

may be we need use atom bomb

process::exit(1);

MarijnS95 · 2022-11-21T11:04:02Z

Rebased and made it so that this does not fix #154 anymore; it's just correctness cleanup.

Besides, it's not going to be very relevant anymore now that https://crates.io/crates/android-activity exists!

MarijnS95 · 2022-11-21T11:08:32Z

ndk-glue/src/lib.rs

+    let mut native_activity_guard = NATIVE_ACTIVITY.write();
+    let native_activity = native_activity_guard.take().unwrap();
+    assert_eq!(native_activity.ptr().as_ptr(), activity);


NativeActivity doesn't have a drop handler (that would previously call .finish() in the original revision of this PR) so nothing funky is going to happen, just that users will now get None out of fn native_activity().

`NativeActivity` is now behind a lock, and will be removed as soon as `onDestroy` is called. Due to the async nature of events, make sure to hold on to the lock received from `native_activity()` _beforehand_ if you wish to use it during handling of `Event::Destroy`.

MarijnS95 requested a review from dvc94ch July 24, 2021 22:00

MarijnS95 force-pushed the glue-finish-activity-after-main branch from 10cbc84 to 81b225f Compare July 24, 2021 22:01

dvc94ch approved these changes Jul 24, 2021

View reviewed changes

MarijnS95 force-pushed the glue-finish-activity-after-main branch from 81b225f to 68af0ab Compare July 24, 2021 22:22

MarijnS95 force-pushed the glue-finish-activity-after-main branch from 68af0ab to fd98799 Compare August 7, 2021 17:49

MarijnS95 changed the title ~~ndk-glue: Finish native activity after returning from main()~~ ndk-glue: Move native activity behind lock and remove it in onDestroy Aug 7, 2021

MarijnS95 requested review from dvc94ch and msiglreith August 7, 2021 17:50

MarijnS95 mentioned this pull request Aug 7, 2021

Exiting the app #154

Closed

MarijnS95 force-pushed the glue-finish-activity-after-main branch from fd98799 to fd239d0 Compare August 7, 2021 17:53

msiglreith approved these changes Aug 28, 2021

View reviewed changes

MarijnS95 mentioned this pull request Jan 5, 2022

Runtime problems when including multiple versions of ndk-glue #211

Closed

MarijnS95 added the status: needs discussion Direction must be ironed out label Apr 19, 2022

MarijnS95 force-pushed the glue-finish-activity-after-main branch from fd239d0 to 37d16d0 Compare November 21, 2022 11:02

MarijnS95 force-pushed the glue-finish-activity-after-main branch from 37d16d0 to 5d93933 Compare November 21, 2022 11:07

MarijnS95 commented Nov 21, 2022

View reviewed changes

MarijnS95 added difficulty: easy Likely easier than most tasks here priority: low Nice to have type: maintenance Repaying technical debt and removed status: needs discussion Direction must be ironed out labels Nov 21, 2022

MarijnS95 force-pushed the glue-finish-activity-after-main branch from 5d93933 to b27b2b7 Compare November 21, 2022 11:13

MarijnS95 merged commit d7e307e into rust-mobile:master Nov 21, 2022

MarijnS95 deleted the glue-finish-activity-after-main branch November 21, 2022 13:26

MarijnS95 mentioned this pull request Mar 16, 2023

Call ANativeActivity_finish() or GameActivity_finish() when android_main returns rust-mobile/android-activity#67

Closed

ndk-glue: Move native activity behind lock and remove it in onDestroy #160

ndk-glue: Move native activity behind lock and remove it in onDestroy #160

Conversation

MarijnS95 commented Jul 24, 2021 • edited Loading

dvc94ch left a comment

Choose a reason for hiding this comment

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021

MarijnS95 commented Jul 24, 2021 • edited Loading

MarijnS95 commented Jul 24, 2021 • edited Loading

dvc94ch commented Jul 24, 2021

dvc94ch commented Jul 24, 2021

MarijnS95 commented Jul 25, 2021

dvc94ch commented Jul 25, 2021

MarijnS95 commented Jul 25, 2021

MarijnS95 commented Jul 27, 2021

msiglreith commented Jul 29, 2021

MarijnS95 commented Aug 7, 2021 • edited Loading

MarijnS95 commented Aug 7, 2021

dvc94ch commented Apr 19, 2022

MarijnS95 commented Apr 19, 2022 • edited Loading

osdalong commented Oct 11, 2022

osdalong commented Oct 11, 2022

MarijnS95 commented Nov 21, 2022

MarijnS95 Nov 21, 2022

Choose a reason for hiding this comment

ndk-glue: Move native activity behind lock and remove it in `onDestroy` #160

ndk-glue: Move native activity behind lock and remove it in `onDestroy` #160

MarijnS95 commented Jul 24, 2021 •

edited

Loading

MarijnS95 commented Jul 24, 2021 •

edited

Loading

MarijnS95 commented Jul 24, 2021 •

edited

Loading

MarijnS95 commented Aug 7, 2021 •

edited

Loading

MarijnS95 commented Apr 19, 2022 •

edited

Loading