Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred render init #4913

Closed
wants to merge 4 commits into from
Closed

Deferred render init #4913

wants to merge 4 commits into from

Conversation

rib
Copy link
Contributor

@rib rib commented Jun 3, 2022

Objective

This work was primarily done as part of an effort to get Bevy running on Android.

Currently, the biggest barrier to running Bevy apps on Android is that Android apps don't have a SurfaceView when they first start and we have to wait until we get a Winit Resumed event before we have a valid SurfaceView that can be used for rendering.

Instead of assuming it's valid to create a native window and graphics context immediately while initializing a Bevy application the aim of this PR is to ring fence the code that handles the initialization of render state so it's possible to control (defer) when it's called.

Solution

This introduces the idea of render_init callbacks for an App that can be registered within plugin build() functions and the App runner can decide when it's ready to initialize the render state and call app.render_init() which will invoke all the init functions in the same order that they were registered.

It is then a very mechanical change to move all the plugin build() code that pertains to initializing render state into an add_render_init() callback. In general all code that depends on the RenderApp sub app has been moved into a render init callback.

For example plugin build() code like this:

if let Ok(render_app) = app.get_sub_app_mut(RenderApp) {
    render_app.add_system_to_stage(
        RenderStage::Extract,
        extract_text2d_sprite.after(SpriteSystem::ExtractSprites),
    );
}

becomes:

app.add_render_init(move |app| {
    if let Ok(render_app) = app.get_sub_app_mut(RenderApp) {
        render_app.add_system_to_stage(
            RenderStage::Extract,
            extract_text2d_sprite.after(SpriteSystem::ExtractSprites),
        );
    }
});

and bevy_winit is updated to call app.render_init(); when it's ready to initialize all render state.

Discussion

Due to the fairly large systematic change to wrap the render init code in add_render_init callbacks I was quite keen to share this PR early (even though I can't say I have tested it very heavily) in the hope of potentially being able to avoid too much rebasing if we decide this is an acceptable solution for deferred render state initialization. (This is also why I've avoided conflating this PR with other Android specific fixes/changes.)

I also have a branch I can share based on this that gets a minimal Bevy application running on Android (and the same app has also been tested on Windows), so I'm at least confident that this can help with Android enabling.


Changelog

Added

  • Added App render_init functions for deferring render state initialization

Migration Guide

Any plugin build() code that depends on RenderApp sub app state should be moved into a render init callback, via app.add_render_init(), like:

app.add_render_init(move |app| {
    if let Ok(render_app) = app.get_sub_app_mut(RenderApp) {
        render_app.add_system_to_stage(
            RenderStage::Extract,
            extract_text2d_sprite.after(SpriteSystem::ExtractSprites),
        );
    }
});

@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen A-Windowing Platform-agnostic interface layer to run your app in O-Android Specific to the Android mobile operating system C-Bug An unexpected or incorrect behavior P-High This is particularly urgent, and deserves immediate attention labels Jun 3, 2022
@bjorn3
Copy link
Contributor

bjorn3 commented Jun 3, 2022

I feel like a RunCriteria would be better than modifying the schedule after the app has already started.

@cart
Copy link
Member

cart commented Jun 3, 2022

I'll review this soon. I'd also love to test your android branch if you can share that.

I think its worth pointing out that this has conceptual overlap with "async and/or deferred plugin init" (see #3239 and this thread). Definitely not worth blocking this pr on, as that could be in "design hell" for awhile and android is something we should support asap. Just connecting relevant threads. In the future we might be able to solve this in a more generic / unified way with solutions in that category of thing.

@rib
Copy link
Contributor Author

rib commented Jun 3, 2022

@bjorn3 the challenge here is that we need to defer the initialization of resources that are required to be able to initialize the RenderApp and associated systems.

I didn't really investigate the possibility of being able to initialize all of the RenderApp and associated systems in a kind of limbo state where they don't have any wgpu adapter/device/window/surface to work with but my initial impression was that that would be a lot more complex to achieve.

One of my earlier iterations had initially tried to introduce the notion that the RenderApp could initially be 'paused' because I was hoping to get away with letting the existing plugin build code continue to run as-is and just let the runner decide when to unpause the render app. It wasn't possible to create the RenderApp to the extent needed though when the wgpu render state was being deferred.

It's perhaps also worth noting that some of the render state is also attached as resources to the top-level App, not just the RenderApp, ref:

app.insert_resource(device.clone())
                .insert_resource(queue.clone())
                .insert_resource(adapter_info.clone())

so not only do we need to mutate the RenderApp once the runner lets us initialize render state, we also need to mutate the top-level App.

Another thorn regarding scheduling system updates was with early CreateWindow events which notably can't be handled until we are ready to init render state. Extra care had to be take to block updates until render state initialization to ensure that pending CreateWindow events don't get discarded by the event system before they are handled.

@rib
Copy link
Contributor Author

rib commented Jun 3, 2022

I'll review this soon. I'd also love to test your android branch if you can share that.

Here's the branch I have with a few extra changes on top of this for being able to run an ndk-glue based Android app. I'll also look at posting the test app somewhere (it's just based on the examples/android.rs code) in case that helps with trying it out.

https://github.com/rib/bevy/commits/android-enabling

I think its worth pointing out that this has conceptual overlap with "async and/or deferred plugin init" (see #3239 and this thread). Definitely not worth blocking this pr on, as that could be in "design hell" for awhile and android is something we should support asap. Just connecting relevant threads. In the future we might be able to solve this in a more generic / unified way with solutions in that category of thing.

Okey, yep, makes sense that there might end up being other more general ways of breaking up plugin initialization in the future.

@rib
Copy link
Contributor Author

rib commented Jun 3, 2022

I'll review this soon. I'd also love to test your android branch if you can share that.

Okey, here's an example Bevy app that should build against my android-enabling branch:
https://github.com/rib/agdk-rust/tree/bevy-enabling/examples/ndk-glue-bevy
(btw, all I've tested so far is that the app launches and renders, and it will almost certainly fall apart if you e.g. switch away from the app without further work on lifecycle handling)

I also just force pushed a small fix for the android-enabling branch (in case you might have already fetched that)

Thanks for taking a look!

@bjorn3
Copy link
Contributor

bjorn3 commented Jun 4, 2022

It wasn't possible to create the RenderApp to the extent needed though when the wgpu render state was being deferred.

It's perhaps also worth noting that some of the render state is also attached as resources to the top-level App, not just the RenderApp, ref:

app.insert_resource(device.clone())
                .insert_resource(queue.clone())
                .insert_resource(adapter_info.clone())

so not only do we need to mutate the RenderApp once the runner lets us initialize render state, we also need to mutate the top-level App.

If you add a run criterion that says to not run for all systems that need those resources in the main app, that would work fine, right? Mutating the World to add the resources after the app is already running is fine IMO. It is the adding of systems to the schedule that I'm not entirely happy about. Using a run criterion would also allow surviving a gpu driver crash by stopping all render systems, while I don't think you can remove systems from the schedule once they are added.

@rib
Copy link
Contributor Author

rib commented Jun 4, 2022

It wasn't possible to create the RenderApp to the extent needed though when the wgpu render state was being deferred.
It's perhaps also worth noting that some of the render state is also attached as resources to the top-level App, not just the RenderApp, ref:

app.insert_resource(device.clone())
                .insert_resource(queue.clone())
                .insert_resource(adapter_info.clone())

so not only do we need to mutate the RenderApp once the runner lets us initialize render state, we also need to mutate the top-level App.

If you add a run criterion that says to not run for all systems that need those resources in the main app, that would work fine, right? Mutating the World to add the resources after the app is already running is fine IMO. It is the adding of systems to the schedule that I'm not entirely happy about. Using a run criterion would also allow surviving a gpu driver crash by stopping all render systems, while I don't think you can remove systems from the schedule once they are added.

Sorry that I can't fully remember the difficulties I hit when I first tried to create the RenderApp in a non-runnable state. In my case I didn't use criterion since I'm not that familiar with Bevy ECS still and didn't know about that mechanism, but I guess the pause mechanism I created had effectively a similar result and I ran into difficulties with that approach.

I figured I'd mention about the gpu resources attached to the top app and the issue with the CreateWindow events as thorny details that came to mind but yeah maybe adding those resources wouldn't really be a problem. (incidentally I wasn't actually clear on what (if anything) even depends on those resources being associated with the App, vs the RenderApp and wouldn't be surprised if they are redundant.)

I guess there are probably other things, besides trying to handle gpu crashes, that could highlight pros/cons of this approach that could be good to consider. I tend to think (as someone that's been a GPU driver developer and seen what's involved in trying to reset + recover a locked up GPU) that Bevy probably shouldn't be concerned with trying to gracefully recover from a GPU crash. It's also difficult to really consider design questions based on this because it's a complex, hypothetical situation where we can't really say whether any particular design would really be helpful unless we have a concrete situation we can verify with.

I wouldn't be surprised if there are hints of the kinds of thing you're thinking about here when it comes to being able to support lifecycle events later - such as dealing with pausing / resuming systems on Android / iOS when the app suspends / resumes but that currently seems like a more constrained problem; e.g. requiring us to delete all extracted windows + surfaces on suspend and then recreate surfaces on resume.

One conceptual challenge that could come from trying to solve this at the ECS system scheduling level may be:

What if render state initialization, and the addition of systems to the App / RenderApp depends on knowing about the capabilities of the GPU or the configuration of the native window the app will be rendering on? I guess a criterion approach to being able to bootstrap systems in a paused state implies that all the systems are otherwise setup in a way that's independent of graphics / window system capabilities?

It also seems like there could be added logical complexity when all of these systems need to be set up in a transient paused state where they are not yet full initialized and they may all need to finish some amount of configuration once they are unlocked for the first time by the resources criterion. That sounds like it might also have its own ordering challenges (since there are lots of inter dependencies within these systems atm which is currently consistent with the order that they are initialized when build()ing plugins. One thing that's convenient with the render_init callbacks is that we can easily piggy back on the same dependency-ordered flow of initialization that plugins already have when being built.

I might be mistaken but my impression is that tackling this at the system scheduling level will involve more intricate modifications of the systems themselves to be able to support deferred initialization within those systems, whereas if we treat render state initialization as this one-time thing that happens (but controlled by the runner) then we get to keep basically all of the existing initialization code and those systems can remain mostly oblivious to the fact that they were initialized lazily - which hopefully keeps them simpler.

@@ -750,6 +755,42 @@ impl App {
self
}

/// Adds a function that will be called when the runner wants to initialize rendering state
///
/// A renderer init function `init_fn` is called only once by the app runner once it has
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this needs an explanation that you can have multiple such functions, and how their relative execution order is determined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a pass at clarifying this.

Additionally I've now also explicitly asserted that it's possible to assume that the runner is required to block all system updates until after the render_init functions have been called (so, for example, it's clearer that you can assume that events sent while building a plugin will survive, and won't be handled, until the render_init functions have been called)

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes make sense and the code and docs look good to me. These "render init functions" add another layer of "do your app initialization in the right order", and will have to be considered when tackling #1255.

@superdump
Copy link
Contributor

I haven’t looked at this yet and will definitely need to. One thing that immediately springs to mind after reading the description is: how does this work with the Image asset loader to handle compressed textures? That needs to know what formats are supported before loading them, which in turn means that the adapter information needs to be known and that is currently obtained after renderer initialisation I believe.

@rib
Copy link
Contributor Author

rib commented Jun 5, 2022

I haven’t looked at this yet and will definitely need to. One thing that immediately springs to mind after reading the description is: how does this work with the Image asset loader to handle compressed textures? That needs to know what formats are supported before loading them, which in turn means that the adapter information needs to be known and that is currently obtained after renderer initialisation I believe.

The image loader also defers initialization, but yep this was a particular case I was also unsure too, since I'd seen the comments about how it depends on checking compressed texture formats. Tbh I havent looked too close yet, besides running the simple android test linked above (which notably doesn't load any images).

Considering that all updates are blocked by the runner until after the render_init callbacks have run then my general hope is that invarients for preexisting code and systems (including the image loader) have been maintained.

The main difference with this change is just that the App and RenderApp are effectively initialized in two passes, and from that point on everything should looks the same as before. Since nothing updates between the two passes then its mostly just the runner that has to consider the limbo state where the app is not yet fully initialized.

But yeah, it definitely warrents closer scruitiny here.

rib added 4 commits June 5, 2022 14:58
This adds an App mechanism for deferring the initialization of render
state via `add_render_init` callbacks.

The App runner is responsible for deciding when to call `app.render_init()`
once it's determined that the system is in a suitable state.

All render_init callbacks are called in the order they are added so they
can have inter dependencies in much the same way as plugin `build()`
functions.
This wraps all plugin initialization that depends on the RenderApp
sub app into a render_init callback so that render state initialization
can be deferred by the runner.

For now the winit runner immediately calls app.render_init() so it's
not yet significantly deferred, but plugin building and render state
initialization now happen in two separate passes.

Note: This patch intentionally avoids making functional change to the
code that's moved into render_init functions to help keep this easy
to review. In the case of bevy_render/src/lib.rs any change in
indentation was also avoided to help reduce churn.
On Android the render state is now only initialized when the application
resumes for the first time (at which point it as a valid SurfaceView).

On all other platforms the render state will be initialized based on the
`NewEvents(Init)` event (though this may change in the future if Winit
starts to emit Resumed events consistently for all platforms).

Considering how some events (CreateWindow) events can only be handled
along with render state initialization (at least on Android) this
change also blocks all app update()s until the render state is
initialized, otherwise these events are liable to be cleared before
they can be handled.
The texture loaders depend on being able to query a `RenderDevice` for
supported compressed texture formats, and if no device is found they
(silently) fallback to assuming `all()` formats are supported.

This moves the initialization of the `ImageTextureLoader` and
`HdrTextureLoader` asset loaders into the `.add_render_init()` callback
in `ImagePlugin::build()` to ensure they are only initialized after the
`RenderDevice` has been created.

Tested with `examples/3d/texture.rs`
@rib
Copy link
Contributor Author

rib commented Jun 5, 2022

I haven’t looked at this yet and will definitely need to. One thing that immediately springs to mind after reading the description is: how does this work with the Image asset loader to handle compressed textures? That needs to know what formats are supported before loading them, which in turn means that the adapter information needs to be known and that is currently obtained after renderer initialisation I believe.

Okey, so when I looked into this more closely I found that actually since the initialization of the texture loaders wasn't moved into the render_init callback with my first patch then when they came to query what compressed texture formats are supported they would fail to find a RenderDevice and silently fallback to enabling all() formats.

I've just rebased the branch with an additional patch that now moves the initialization of the texture loaders into a render_init function too, and running examples/3d/texture.rs I was able to see (with a little instrumentation) that the compressed texture formats were now being initialized when there was a RenderDevice available.

@bjorn3 bjorn3 mentioned this pull request Jun 16, 2022
Copy link
Contributor

@superdump superdump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Just a couple of minor comments.

Comment on lines -295 to -296
// NOTE: Load this after renderer initialization so that it knows about the supported
// compressed texture formats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep this comment as I feel the dependency is less obvious and it clarifies why for future readers.

{
app.init_asset_loader::<HdrTextureLoader>();
}

app.add_plugin(RenderAssetPlugin::<Image>::with_prepare_asset_label(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to realise that the RenderAssetPlugin takes care of the deference of initialisation itself when trying to understand why this wasn't also moved inside the .add_render_init() so perhaps a comment explaining that here would avoid an indirection to go look it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking I followed a fairly mechanical process of deferring code that was explicitly dependent on the RenderApp sub app and the texture loaders here where handled as more of a special case after I investigated the ordering regarding initializing compressed texture formats.

Considering the possibility of inter-dependencies between plugins I suppose I wouldn't generally expect that plugin additions for the top-level app would be deferred (in case other plugins might need to reference it while building the app) and would expect it to be the responsibility of individual plugins to know what's appropriate to defer in terms of initialization code.

I'm saying this without being intimately familiar with lots of details in bevy (I'm not familiar with what the RenderAssetPlugin is for) but the way I see it atm is that it's an implementation detail for RenderAssetPlugin to decide whether it should be deferring certain initialization and so since it's a top-level app plugin I'd expect to see it get added in the first build() pass.

Another way of looking at it is that there's conceptually a difference between adding plugins and initializing plugins even though those two concerns are essentially inter-related with how build() works. I'd generally expect that all top-level plugins should be added unconditionally during the first build() and it's only various bits of initialization logic for the plugins that should be deferred - and for that it's up to the individual plugins to know what initialization code needs deferring.

Something else to consider when thinking about this is that the render_init mechanism is simple in the sense that it doesn't have any kind of understanding of app/plugin topology or recursion, so it e.g. wouldn't currently work to try and defer the addition of a plugin that itself might want to try and explicitly use a render_init callback to defer initialization work. This is because the render init callbacks are simply invoked in one pass by the runner.

Keeping the callback mechanism itself simple is hopefully for the best here but potentially it would be reasonably feasible to support running the callbacks in multiple passes to allow for that kind of deferred plugin addition combined with internal initialization deferral. In this case the runner would essentially keep invoking the render_init callbacks until there were none left. Even though I think it could be good to avoid needing this kind of multi-pass initialization it would probably be good to add some explicit checks to make sure nothing inadvertently tries to register a render_init callback while the render_init callbacks are being called.

I'm wondering now whether it could be good to say more in the docs for .add_render_init() about distinguishing plugin addition from plugin initialization to help set that expectation that top-level plugins are generally expected to be added unconditionally while building() - so then maybe this would seem less surprising here?

Essentially the imperative app building implicitly defines a kind of graph of plugin dependencies, based on the order they are added. The render_init callback mechanism also conveniently piggy backs on that ordering such that the order things are built + added will also correspond to the order that render initialization happens - and that's generally based on the expectation that all (top-level app) plugins are added in one pass while building the app.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your arguments about ownership, responsibility, encapsulation, things being an implementation detail seem reasonable, but if the implementation detail has consequences outside the thing (RenderAssetPlugin) then the encapsulation is already broken. As a random example - if the order of deferred initialisation matters and someone doesn't realise that RenderAssetPlugin is doing deferred initialisation, that could catch them out.

I think whether it is surprising or not is also a matter of perspective. For code, I bias toward the idea that if I can't see it, I don't immediately know it's happening. The unseen thing is then either something that is more likely to catch me out, or it is additional cognitive load to remember so that I don't get caught about by it.

I don't expect people working on rendering features to know/remember everything about how the rest of bevy works. Not to mention that rendering is complicated enough on its own. :) If someone is coming from a rendering perspective and reading this code, it would be easy for them to miss that there is deferred initialisation stuff happening in RenderAssetPlugin and make changes based on what they see.

All I am suggesting to help with this is a small comment on the RenderAssetPlugin add_plugin line here to make it clear that that is doing deferred renderer initialisation stuff to try to help future readers (including myself honestly) to not make changes here and break initialisation on Android that could otherwise go unnoticed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah happy to add a comment, but I'm also trying to figure out what the comment should say (since we currently have different ideas atm about what's expected, and I might not be aware of some inter dependency here)

I was just meaning to see if highlighting those details might change your expectations and then I could figure out whether it'd be better to have extra docs for .add_render_init() or have a comment here.

The expectation I was looking to nudge towards was that you should be surprised to see any .add_plugin for a top level plugin be deferred via .add_render_init.

Is there maybe some specific inter dependency here with RenderAssetPlugin and the texture loaders that's important to track that I'm not familiar with which I can highlight? I guess there was some particular reason you were concerned to check how RenderAssetPlugin defers its initialization because you know there's an inter dependency here?

It's notable that my change to defer app.init_asset_loader::<ImageTextureLoader>(); does effectively swap the order relative to any deferred RenderAssetPlugin initialization and maybe it's dumb luck that that's ok, or maybe it's not ok?

I'm unsure about just commenting that RenderAssetPlugin will do deferred initialization without a more specific inter dependency to highlight because that'd be documenting an implementation detail that could change in the future? It's possible that any plugin you add might choose to defer some initialization, so guess there should be some specific ordering concern to highlight?

@alice-i-cecile alice-i-cecile modified the milestone: Bevy 0.8 Jul 4, 2022
Copy link
Member

@cart cart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried doing a merge into main to test this on my android phone (in combination with #5130), but I get a blank screen. Idk if I missed something in the merge or if these changes aren't compatible, but I couldn't get anything to draw on the screen. On main, it at least sometimes renders to the screen for me.

Here are my logs.

07-07 19:56:37.126  1588  1881 I ActivityManager: Start proc 20188:org.bevyengine.example/u0a270 for pre-top-activity {org.bevyengine.example/android.app.NativeActivity}
07-07 19:56:37.137 20188 20188 E yengine.exampl: Not starting debugger since process cannot load the jdwp agent.
07-07 19:56:37.138  3725  5014 I AssistantForeground: (REDACTED) Get launcher package: %s
07-07 19:56:37.146 20188 20188 D CompatibilityChangeReporter: Compat change id reported: 171979766; UID 10270; state: ENABLED
07-07 19:56:37.149 20188 20188 W System  : ClassLoader referenced unknown path: 
07-07 19:56:37.154 20188 20188 V GraphicsEnvironment: ANGLE Developer option for 'org.bevyengine.example' set to: 'default'
07-07 19:56:37.154 20188 20188 V GraphicsEnvironment: ANGLE GameManagerService for org.bevyengine.example: false
07-07 19:56:37.155 20188 20188 V GraphicsEnvironment: Neither updatable production driver nor prerelease driver is supported.
07-07 19:56:37.155 20188 20188 D NetworkSecurityConfig: No Network Security Config specified, using platform default
07-07 19:56:37.155 20188 20188 D NetworkSecurityConfig: No Network Security Config specified, using platform default
07-07 19:56:37.162 20188 20204 D vulkan  : searching for layers in '/data/app/~~UQSRHp_0tp879I2cdru6ng==/org.bevyengine.example-dWeck3wdvy2IIK8PvtFfKA==/lib/arm64'
07-07 19:56:37.163 20188 20204 D vulkan  : searching for layers in '/data/app/~~UQSRHp_0tp879I2cdru6ng==/org.bevyengine.example-dWeck3wdvy2IIK8PvtFfKA==/base.apk!/lib/arm64-v8a'
07-07 19:56:37.167 20188 20204 D mali_cmarp_predictor: checking cmar_predictor for org.bevyengine.example
07-07 19:56:37.189 20188 20217 I event crates/bevy_render/src/renderer/mod.rs:98:  AdapterInfo { name: "Mali-G78", vendor: 5045, device: 2449604624, device_type: IntegratedGpu, backend: Vulkan }
07-07 19:56:37.190 20188 20217 D mali_cmarp_predictor: checking cmar_predictor for org.bevyengine.example
07-07 19:56:37.191  1588  1870 I ActivityTaskManager: Displayed org.bevyengine.example/android.app.NativeActivity: +87ms
07-07 19:56:37.201 27976 27976 I GoogleInputMethodService: GoogleInputMethodService.onFinishInput():3220 
07-07 19:56:37.205 27976 27976 I GoogleInputMethodService: GoogleInputMethodService.updateDeviceLockedStatus():2116 checkRepeatedly = false, unlocked = true
07-07 19:56:37.205 27976 27976 I GoogleInputMethodService: GoogleInputMethodService.onStartInput():1923 onStartInput(EditorInfo{inputType=0x0(NULL) imeOptions=0x0 privateImeOptions=null actionName=UNSPECIFIED actionLabel=null actionId=0 initialSelStart=-1 initialSelEnd=-1 initialCapsMode=0x0 hintText=null label=null packageName=org.bevyengine.example fieldId=-1 fieldName=null extras=null}, false)
07-07 19:56:37.205 27976 27976 I GoogleInputMethodService: GoogleInputMethodService.updateDeviceLockedStatus():2116 checkRepeatedly = true, unlocked = true
07-07 19:56:37.229 20188 20217 E BufferQueueProducer: [ViewRootImpl[NativeActivity]#0(BLAST Consumer)0](id:4edc00000000,api:1,p:20188,c:20188) connect: already connected (cur=1 req=1)
07-07 19:56:37.229 20188 20217 E vulkan  : native_window_api_connect() failed: Invalid argument (-22)
07-07 19:56:37.229 20188 20216 I RustStdoutStderr: thread '<unnamed>' panicked at 'AndroidSurface failed: ERROR_NATIVE_WINDOW_IN_USE_KHR', /home/carter/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/wgpu-hal-0.12.5/src/vulkan/instance.rs:350:69
07-07 19:56:37.229 20188 20216 I RustStdoutStderr: stack backtrace:
07-07 19:56:37.230 20188 20216 I RustStdoutStderr: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
07-07 19:56:37.233  3725  3725 I GsaVoiceInteractionSrv: Handling ACTION_STOP_HOTWORD
07-07 19:56:37.249 20188 20216 I RustStdoutStderr: thread 'Compute Task Pool (0)' panicked at 'internal error: entered unreachable code: sending into a closed channel', crates/bevy_ecs/src/schedule/executor_parallel.rs:200:49
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: thread 'Compute Task Pool (1)' panicked at 'internal error: entered unreachable code: sending into a closed channel', crates/bevy_ecs/src/schedule/executor_parallel.rs:200:49
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: stack backtrace:
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: thread 'Compute Task Pool (2)' panicked at 'internal error: entered unreachable code: sending into a closed channel', crates/bevy_ecs/src/schedule/executor_parallel.rs:200:49
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: stack backtrace:
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: stack backtrace:
07-07 19:56:37.250 20188 20216 I RustStdoutStderr: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
07-07 19:56:37.575  1588  3796 W InputManager-JNI: Input channel object '9949d8e Splash Screen org.bevyengine.example (client)' was disposed without first being removed with the input manager!

Happy to push my (potentially broken) merge. But feel free to try merging on your own / see if you can get something working.

.add_system_to_stage(RenderStage::Queue, queue_material2d_meshes::<M>);
}

app.add_render_init(move |app| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest critique at this point is that RenderApp init is now pretty arcane. Devs now need to know that RenderApp isn't available until the render_init step and then they still need to call get_sub_app_mut(RenderApp) to get access to the RenderApp.

I think we could abstract over this a bit better, maybe by providing a RenderPlugin trait:

trait RenderPlugin {
  fn build(&self, app: &mut App, render_app: &mut App);
}

Then devs no longer need to know about "init lifecycles". They just implement a trait and call add_render_plugin(MyPlugin). No need to know about RenderApp labels or deal with get_sub_app results. Given that sub_apps exist solely for the renderer at this point (and that pipelined rendering will further derail the sub_app approach to things), we could even consider removing sub_apps entirely in favor of letting the runner pass around the render app directly.

This would also pair nicely with supporting "headless" rendering. Right now we support it by convention by calling get_sub_app_mut and not failing when it doesn't exist. We could abstract that out from users / make it "fool proof" by just not initializing RenderPlugins in headless mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could still use the "closure" pattern, but pass in the RenderApp reference directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, having a trait method for render state initialization instead of a closure sounds reasonable to me

I wasn't initially sure whether the dependence on the RenderApp should conceptually be considered an implementation detail (I wasn't sure if technically it's supposed to be possible to create alternative plugins that would come with a different rendering architecture etc that might have a different sub app, but I guess not?)

The only render init code I can think of atm that doesn't directly touch the RenderApp might be the code that initializes the texture loaders, so having the mechanism directly pass around the render sub app sounds good / practical.

Technically I guess this will be pretty similar to the render_init callbacks in the sense that there would still need to be a similar vector of render plugins that have been added, to ensure they are initialized in the same order they are added - the main difference I suppose is just switching to a trait method instead of a closure, plus the convenience of passing the render_app directly.

There will be a small loss of generality I suppose in that render init code will be tightly coupled to a plugin (technically orthogonal build code could end up registering multiple render init callbacks per plugin with the current closure design) but that doesn't seem like it would be an issue in practice.

I half wonder if 'build' is the right verb vs 'init' but maybe it's ok. I currently associate the recursive addition of plugins as being part of the 'build' process and there's some fuzzy line between code that's 'initialization' vs 'building' and since all plugin additions will still presumably be done during app building I'm not sure if this is more 'initialization' than it is 'building'.

/// The [render init functions](Self::add_render_init) are responsible for completing
/// the initialization of plugins once the `runner` has determined that the system is
/// ready to support rendering.
pub render_inits: Vec<Box<dyn FnOnce(&mut App)>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the topic of "should this be included in 0.8": I think that largely comes down to "can we make android work better than it does on main / with #5130". If yes, I think I'm happy to merge either as-is or with RenderPlugin-like changes (see my last comment). If no, I think it would be better to hold off and take our time until we find something that improves on the current state of things.

@komadori
Copy link
Contributor

I tested this branch on my Android device (Samsung S21+). With no code changes to the branch, the example always fails early with wgpu::RequestDeviceError::DeviceLost. Hence, I merged only the WgpuSettingsPriority::Compatibility change to the example from #5130 and it now always fails slightly later with the same ERROR_NATIVE_WINDOW_IN_USE_KHR error as cart reported above. This problem seems to be on the branch (and in the variation between Android devices) therefore rather than related to any merging with main.

@rib
Copy link
Contributor Author

rib commented Jul 12, 2022

TL;DR - please remember that I shared a separate android-enabling branch here for testing Android specifically - considering this isn't an Android-only change and it's only really one of multiple issues that need to be tackled to get Android working in Bevy. In particular that branch avoids creating a second surface that will result in an ERROR_NATIVE_WINDOW_IN_USE_KHR error on Android.

Hi, there are two likely reasons I can think of for an ERROR_NATIVE_WINDOW_IN_USE_KHR error on Android which I'd generally say are related to different issues that needs addressing besides being able to defer the initialization of render state:

  1. Bevy creates a second transient surface when it tries to find a compatible wgpu adapter - instead of somehow referencing any surface already created for a primary window. Generally speaking this doesn't seem ideal for any platform, considering that surfaces may have large allocations associated with them, which sometimes also come out of limited regions of memory. On Android though Vulkan will throw this error. Ref: rib@1ea04f0

  2. ndk-glue has some design issues with how it synchronizes with Java which notably affects how Winit/Bevy may synchronize with a window Destroyed event. I think it's more likely that this could lead to trying to render after the surface has been destroyed but maybe it could also lead to a situation where Bevy ends up trying to create multiple surfaces too - I'm not 100% sure about that case off the top of my head.

Part of the challenge is that there are multiple notable issues with supporting Bevy on Android and deferring the render initialization is only really one of those things.

Other issues include:

  1. Finding a solution for avoiding the creation of multiple surfaces (as mentioned above).
  2. Application lifecycle tracking that can properly support dropping and re-creating the wgpu surface for the primary window when the app suspends and later resumes.
  3. Support for tracking Android Configuration changes for properly handling things like orientation changes.
  4. Support for hooking into Android Save State events for applications to be able to save and restore state that allows seamless continuation after being stopped/started
  5. probably more things...

My approach was to have a separate branch that validated making forward-progress with Android which was based on this work, but this branch in itself isn't a full solution for supporting Android. I mentioned this above and after @cart asked I also shared my Android branch and pointed to an example that I had tested/validated on Android based on that branch:

Here's the branch I have with a few extra changes on top of this for being able to run an ndk-glue based Android app. I'll also look at posting the test app somewhere (it's just based on the examples/android.rs code) in case that helps with trying it out.

https://github.com/rib/bevy/commits/android-enabling

ref: #4913 (comment)

It's maybe also worth noting here that there is ongoing work happening in Winit to look at potentially replacing ndk-glue with a glue layer that follows a different design; solving the current Java synchronization issues as well as several other issues (such as supporting multiple Activity base classes): ref rust-windowing/winit#2293 (comment) and ref: https://github.com/rib/android-activity

@cart
Copy link
Member

cart commented Jul 14, 2022

Thanks for the thorough response @rib. I understand that there are a lot of variables here and that this type of deferred renderer init is an important stepping stone. I dont expect this PR to add full android support. I was a little surprised that it appears to have regressed our "hacks" that kind-of-sort-of make android work sometimes. Not a big deal, and I'm happy to regress our hack-ey solution in the interest of getting closer to a "correct" solution.

I'll probably hold off on merging this until after 0.8, in the interest of working out the ideal "deferred init api" (see my comment above), and also releasing 0.8 with kinda-sorta-but-not-really-working android support (to indicate to the community that we're investing in this).

@alice-i-cecile alice-i-cecile mentioned this pull request Jul 18, 2022
@cart cart removed this from the Bevy 0.8 milestone Jul 19, 2022
@rib rib mentioned this pull request Dec 3, 2022
3 tasks
@superdump
Copy link
Contributor

@mockersf this is resolved isn't it?

@alice-i-cecile
Copy link
Member

This has been resolved with a similar approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen A-Windowing Platform-agnostic interface layer to run your app in C-Bug An unexpected or incorrect behavior O-Android Specific to the Android mobile operating system P-High This is particularly urgent, and deserves immediate attention
Projects
Status: Responded
Development

Successfully merging this pull request may close these issues.

6 participants