Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndk-glue: Move native activity behind lock and remove it in onDestroy #160

Merged

Conversation

MarijnS95
Copy link
Member

@MarijnS95 MarijnS95 commented Jul 24, 2021

When an application returns its control flow back from main() it is generally assumed to be done polling on the looper and wishes to terminate. Android is made aware of this by calling ANativeActivity_finish, otherwise it'll continue sending input events
and eventually ANR as the app didn't consume them anymore.

NativeActivity is now behind a lock, and will be removed as soon as onDestroy is called. Due to the async nature of events, make sure to hold on to the lock received from native_activity() beforehand if you wish to use it during handling of Event::Destroy.


TODO: I assume https://developer.android.com/ndk/reference/group/native-activity#anativeactivity_finish says that ANativeActivity is invalid after this call? If so we should make fn finish(&self) take ANativeActivty by value instead (to force use of .take()) - we could even have a drop handler? Done!

@MarijnS95 MarijnS95 requested a review from dvc94ch July 24, 2021 22:00
@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from 10cbc84 to 81b225f Compare July 24, 2021 22:01
Copy link
Contributor

@dvc94ch dvc94ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

TODO: I assume https://developer.android.com/ndk/reference/group/native-activity#anativeactivity_finish says that ANativeActivity is invalid after this call? If so we should make fn finish(&self) take ANativeActivty by value instead (to force use of .take())

That makes sense

we could even have a drop handler?

probably not worth overdoing this

@MarijnS95
Copy link
Member Author

we could even have a drop handler?

probably not worth overdoing this

It would be clean. Not everyone might use ndk_glue (though it is unlikely) and it would lead to a dangling pointer (though that's the least of our concerns) and ANR with no way to recover if it's lost.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

Sure if you want to do it

@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from 81b225f to 68af0ab Compare July 24, 2021 22:22
@MarijnS95
Copy link
Member Author

@dvc94ch Done :)

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

So is the drop handler guaranteed to be executed before the mutex is released?

@MarijnS95
Copy link
Member Author

@dvc94ch NATIVE_ACTIVITY is not protected by any mutex, otherwise we'd have to lock this for writing.

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L56

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

wait didn't this used to be a RwLock?

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

I guess not, so ignore my comments :)

@MarijnS95
Copy link
Member Author

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

@MarijnS95
Copy link
Member Author

It's probably not behind a lock because it's always "statically" set during activity execution and doesn't change (well, except when someone returns from fn main() while still having a thread alive that might access it), but all the other variables can reasonably change whenever Android feels like calling one of the callbacks.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

I think the static mut is only safe because it is initialized at startup and never written to

@MarijnS95
Copy link
Member Author

but it could panic right?

https://github.com/rust-windowing/android-ndk-rs/blob/49e8ed51220567dccac9fd1e8baa526e9cdac163/ndk-glue/src/lib.rs#L59

Yes, but only if the user calls native_activity() after returning from main(), which we can simply deem invalid and incorrect.

@MarijnS95
Copy link
Member Author

I think the static mut is only safe because it is initialized at startup and never written to

Exactly.

I think we might change unwrap() to .expect("Native activity has already been finished") to make this more clear.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

well that's the point, it's two separate instructions, so if there are multiple threads there could be a race. But I'm probably overanalyzing this.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

So what I'm saying is the program does this:

  1. make static mut NATIVE_ACTIVITY None
  2. anything could happen including calling native_activity
  3. the drop handler is run

while 1 and 2 could actually happen at the same time meaning it's UB.

@MarijnS95
Copy link
Member Author

Reading or writing the pointers is only a single instruction. The point is to prevent other threads doing so while the thread holding the lock is doing different things with it. No such case with ANativeActivity as far as I'm aware.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

native_activity is public so it could happen. and I think winit does actually call native_activity and it runs in a separate thread.

@MarijnS95
Copy link
Member Author

So what I'm saying is the program does this:

  1. make static mut NATIVE_ACTIVITY None
  2. anything could happen including calling native_activity
  3. the drop handler is run

while 1 and 2 could actually happen at the same time meaning it's UB.

The only one making NATIVE_ACTIVITY = None is the .take() here. Theoretically nothing else should run that calls .native_activity(). We don't do that in any of the ANativeActivityCallbacks (that could perhaps run after _finish() is called, I don't know), and if the user does that, they shouldn't have a looper listening to events in the first place as they have already returned from fn main().

@MarijnS95
Copy link
Member Author

MarijnS95 commented Jul 24, 2021

native_activity is public so it could happen. and I think winit does actually call native_activity and it runs in a separate thread.

Yeah I think that's the point we're trying to make - or the convention we're trying to set - here. The crate shouldn't have any threads live after returning from fn main(), as that makes us unable to call .finish(). Since non-daemon threads are a thing we might instead require the end user to manually drop NativeActivity or run .finish()? Then we have the choice to take self by value, or keep the drop handler and recommend users to call drop(native_activity) instead. They should be able to .take() it from the Option either way (as I don't think having it around is a good idea after _finish() was called), and we should document this requirement re. #154.

@MarijnS95
Copy link
Member Author

MarijnS95 commented Jul 24, 2021

We can possibly drop/.finish() in an atexit handler as well?

EDIT: I doubt the atexit handler runs unless all threads are gone (and the entire process terminates), which is likely not the case here otherwise the app would have vanished instead of becoming unresponsive. We might have something sticking around somewhere.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

I think taking self by reference would avoid writing to a static mut which is usually not a good idea. Presumably after calling finish the jvm stops the process and kills all threads? The alternative would be to just put the native activity in a RwLock and just hold the lock while calling finish. Or we can leave it as is and see if there are any panics/segfaults in the wild, which probably is pretty rare in practice.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 24, 2021

It's funny how the smaller the PR the larger the discussion hahha

@MarijnS95
Copy link
Member Author

It's funny how the smaller the PR the larger the discussion hahha

Yup the bigger churn is never an issue, but smaller controversial lines like these need to be thoroughly thought out and discussed. It's good that it's a single, isolated change in a PR anyway, and not hidden in some mega-PR :)

I think taking self by reference would avoid writing to a static mut which is usually not a good idea.

If we decide to expose drop/.finish() to the user we should definitely use by-value (move) semantics and turn the static into a lock. Allowing the user to use the NativeActivity pointer after calling ANativeActivity_finish() is probably just as bad if not worse than the race conditions that might follow.

Presumably after calling finish the jvm stops the process and kills all threads?

That's possible. Just like normal Android Activity instances the app is kept alive until explicitly calling .finish() even if no user (non-daemon) threads are spawned. That's the nature of being asynchronous and using callbacks.

In this instance automatically calling .finish() after returning from fn main() isn't the best idea, see first bulletpoint below.

The alternative would be to just put the native activity in a RwLock and just hold the lock while calling finish. Or we can leave it as is and see if there are any panics/segfaults in the wild, which probably is pretty rare in practice.

Pretty sure we'd see segfaults or other UB if the user were to continue using NativeActivity and friends after calling ANativeActivity_finish - even if Android kills all threads there's still a possibility for race conditions before that happens.

After a good night sleep, these are the main takeaways:

  • ndk_glue::main shouldn't really be called main, it's only part of the ANativeActivity_onCreate "callback". The special "finish-on-return" semantics I'm introducing here don't make much sense in that regard. Can we rename it to ndk_glue::onCreate so that it is clear that returning from this function doesn't have any shutdown guarantees?
    • Existing Rust apps might already rely on this behaviour (spawn their own threads and return).
  • Exiting the app #154 Is not really an issue at all, we should just document more properly that .finish() has to be called whenever the users wishes to terminate.
  • NativeActivity should either have a drop handler, or take self by value in fn finish(self) to prevent leaking dead state/pointer back to the user.

@dvc94ch
Copy link
Contributor

dvc94ch commented Jul 25, 2021

ndk_glue::main shouldn't really be called main, it's only part of the ANativeActivity_onCreate "callback". The special "finish-on-return" semantics I'm introducing here don't make much sense in that regard. Can we rename it to ndk_glue::onCreate so that it is clear that returning from this function doesn't have any shutdown guarantees?

That sounds good yes.

NativeActivity should either have a drop handler, or take self by value in fn finish(self) to prevent leaking dead state/pointer back to the user.

I guess it won't be possible to use it with the glue, as we only expose a fn native_activity() at the moment. we can wrap it in a RwLock and expose a fn finish_activity(), but that would be a breaking change and would probably require a PR to winit too.

@MarijnS95
Copy link
Member Author

In hindsight I'm not too certain about the proposed changes above. The looper is explicitly associated with the current thread, and returning from fn main() means the looper must have been given up. It is apparently possible to pass a looper to a different thread using ALooper_acquire() and it's probably fine then to call the various poll*() functions from a different thread too.

I guess it won't be possible to use it with the glue, as we only expose a fn native_activity() at the moment. we can wrap it in a RwLock and expose a fn finish_activity(), but that would be a breaking change and would probably require a PR to winit too.

Yes a separate read and write lock function have to be exposed, or what I was experimenting with today: fn finish_native_activity() that takes and finishes the object, without ever exposing a public function with &mut access to the Option<NativeActivity>.

Perhaps it is a good idea to collect some more opinions on this complicated and controversial change. Maybe @msiglreith and @francesca64 have some suggestions?

@MarijnS95
Copy link
Member Author

@msiglreith Checking out winit it never calls .finish() for the user anywhere, probably leading to a stuck application. Crates using winit are responsible for using the ndk_glue::main macro directly themselves, making them responsible for knowing about returning from main() finishing the activity if we were to push this PR through in its current form.

What do you think about the takeaway steps listed in #160 (comment)? In short:

  • Rename ndk_glue::main to ndk_glue::onCreate;

  • Use locks around NativeActivity and pass it by-value in .finish() or use a drop handler;

  • Document the need to call .finish() on application exit, most likely through an fn native_activity_finish() so that we don't have to move the NativeActivity to the user anywhere;

  • Use finish()/drop() in winit once this lands, whenever the user sets ControlFlow::Exit and forward Event::Destroyed per your description.
    Note that the documentation for onDestroy says:

    NativeActivity is being destroyed.

    At that point it's probably still a live, valid pointer (passed as sole argument to the callback) but fn native_activity() would already be None if we were to implement proper move semantics, so we have to carefully consider what happens. If ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics (but &mut so that the user can't call it when retrieving an immutable handle through the glue, only when building an app without glue).
    However, we're processing all these callbacks asynchronously when we should most likely wait inside onDestroy until the application (thread polling the event pipe) says: "okay, I'm done with the native activity now, go ahead!", just like the lock scenario in ndk-glue does not handle NativeWindow lifecycle correctly #117 / Notify event pipe before releasing NativeActivity resources #134.
    That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

@msiglreith
Copy link
Contributor

@MarijnS95 I'm fine with these. Regarding finish vs drop I would prefer calling finish as it might be easier to discover/read than having the explicitly drop a static.

ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics

IIRC it just post some event to another thread/process and returns.

That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

yeah. this seems to be the best way to move forward (even though a bit hacky).

@MarijnS95
Copy link
Member Author

MarijnS95 commented Aug 7, 2021

@msiglreith Thanks for your view, and apologies for letting this sit unattended for a while.

@MarijnS95 I'm fine with these. Regarding finish vs drop I would prefer calling finish as it might be easier to discover/read than having the explicitly drop a static.

Assuming we never want to give the user ownership of the NativeActivity instance while still using the appropriate move semantics this is going to be a global fn finish_native_activity() anyway (the user won't be able to call fn finish(self) if it can't pull it out of the lock in the first place). Internally we can do it either way, but since finish is "magical" and not just cleaning up some object (it shuts down the application) it's probably better to keep it an explicit .take().finish(); indeed.

EDIT: Realizing now that this doesn't make much sense, given what I wrote below. .finish() should be safe to call on an (immutable) reference to NativeActivity without assuming ownership, since the onDestroy callback will only be called after that. that's - just like onNativeWindowDestroyed - probably the only sensible place to set our global handle to None.

ANativeActivity_finish blocks while those callbacks are being fired we can safely set it to None after it returns and keep by-reference semantics

IIRC it just post some event to another thread/process and returns.

So the pointer to ANativeActivity stays alive after this, and the user is still allowed to call functions on it? Seeing as onDestroy receives this pointer it should still be valid, but Android might not allow to call certain functions anymore.

That should be easy enough to solve with an RWLock as well, and advising any application that needs to use NativeActivity in Event::Destroy to get and store the read-lock beforehand...

yeah. this seems to be the best way to move forward (even though a bit hacky).

It's already a pattern for NATIVE_WINDOW so it shouldn't hurt too much. If a developer relies on this they'll quickly run into a panic if the thing is already None. Either way I'll add this to the enum documentation on Event :)

@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from 68af0ab to fd98799 Compare August 7, 2021 17:49
@MarijnS95 MarijnS95 changed the title ndk-glue: Finish native activity after returning from main() ndk-glue: Move native activity behind lock and remove it in onDestroy Aug 7, 2021
@MarijnS95 MarijnS95 mentioned this pull request Aug 7, 2021
@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from fd98799 to fd239d0 Compare August 7, 2021 17:53
@MarijnS95
Copy link
Member Author

I don't think this PR is actually fixing "issue" #154 anymore. We should probably update the documentation and examples to call .finish() in order to close that. But at least we're removing the activity after onDestroy is called - which isn't what this PR initially set out to do at all 😬

@dvc94ch
Copy link
Contributor

dvc94ch commented Apr 19, 2022

to be honest don't quite get the purpose of this change. also needs to be rebased.

@MarijnS95
Copy link
Member Author

MarijnS95 commented Apr 19, 2022

@dvc94ch No surprise, we've been going back and forth on possible issues and implementations of this. IIRC the gist is that we can call .finish() for the user so that their NativeActivity doesn't remain dormant after exiting. But I'll have to re-read it to see if this is still relevant at all or if the final resolution is "don't care, just document that the user should call .finish() when they're done".

@osdalong
Copy link

hi sirs
I found the app is still alive although it calls the finish(). my version is 0.10.0, the main() code is below

pub fn main() {
println!("main...");
println!("{:?}", *APP_CONFIG);
let mut app = AppData {
destroy_requested: false,
resumed: false,
};
run(&mut app).unwrap();
println!("main.. successfully shutdown.");
// the ndk_glue api does not automatically call this and without
// it main will hang on exit, currently there seems to be no plans to
// make it automatic, refer to:
// #154
ndk_glue::native_activity().finish();
}

@osdalong
Copy link

may be we need use atom bomb

process::exit(1);

@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from fd239d0 to 37d16d0 Compare November 21, 2022 11:02
@MarijnS95
Copy link
Member Author

Rebased and made it so that this does not fix #154 anymore; it's just correctness cleanup.

Besides, it's not going to be very relevant anymore now that https://crates.io/crates/android-activity exists!

@MarijnS95 MarijnS95 force-pushed the glue-finish-activity-after-main branch from 37d16d0 to 5d93933 Compare November 21, 2022 11:07
Comment on lines +341 to +343
let mut native_activity_guard = NATIVE_ACTIVITY.write();
let native_activity = native_activity_guard.take().unwrap();
assert_eq!(native_activity.ptr().as_ptr(), activity);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NativeActivity doesn't have a drop handler (that would previously call .finish() in the original revision of this PR) so nothing funky is going to happen, just that users will now get None out of fn native_activity().

@MarijnS95 MarijnS95 added difficulty: easy Likely easier than most tasks here priority: low Nice to have type: maintenance Repaying technical debt and removed status: needs discussion Direction must be ironed out labels Nov 21, 2022
`NativeActivity` is now behind a lock, and will be removed as soon as
`onDestroy` is called. Due to the async nature of events, make sure to
hold on to the lock received from `native_activity()` _beforehand_ if
you wish to use it during handling of `Event::Destroy`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: easy Likely easier than most tasks here priority: low Nice to have type: maintenance Repaying technical debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants