-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Fix staging buffer required size calculation (fixes #1056) #1509
Conversation
This could also fix #1138. |
I'm pretty sure I also tried this approach at some point. For what it's worth, I'm super excited to have someone else experiencing this crash and motivated to dig into the engine to try to fix it. As far as testing this I have https://github.com/rparrett/bevy-test/tree/render-panic Which is a minimal test case which this seems to fix https://github.com/rparrett/taipo/tree/spritesheet-panic-two Which is an old snapshot of my game, modified to self-play until it crashes. This changeset seems to crash the game even sooner with a very similar panic. But it's a nightmare of forked dependencies and I'm not positive that I ended up testing this well. https://github.com/rparrett/taipo which is currently crashing somewhat consistently, but only after over 5 minutes of play which requires being able to read some Japanese. I'm out of time to test this one, but I'll check it out later tonight. |
I've tested this a couple of times as-is (without any change), and it never crashed. The crabs just went on to attack the rock and stayed there forever. So perhaps these we are dealing with different bugs? I've tried applying the proposed changes on top of the used version of bevy, but as soon as
Got this to compile and load in the browser, but the game didn't start after hitting "English". Not sure what to try out next.
I would really appreciate it, thanks! |
I should mention that I believe that a high-dpi display is required to reproduce this. I've been meaning to try just doubling the font sizes on my low-dpi machine, but haven't gotten around to it.
Not sure what could be going on there. Not something I've experienced on either machine I have access to. No errors in the javascript console? I wouldn't be surprised if there's some new-bevy-scheduler-related bug though. But also I haven't actually tried to reproduce the panic in english-mode, because I'm assuming that large/multiple font atlases are playing a role. |
Really appreciate you digging into this so much. So I suspect that the "game doesn't start after pressing a button in the main menu" issue you're seeing could actually be the game panicking. I got the auto-typing / auto-crashing working on the latest version of the game which works with the latest version of bevy. https://github.com/rparrett/taipo/tree/spritesheet-panic-three (click the third menu item, "Kana + N5 + Yamanote") And what I'm seeing is:
|
You're absolutely right, this is indeed a panic. I'll try to figure out what's happening this time, I probably missed something. At least now it's really quick to reproduce :) |
I just had fairly large dose of good old humble pie. Turns out the buffer allocation logic in I've revised my PR, and it now fixes both test-cases. I went over |
I've updated Sadly, with this commit, it seems that we're just delaying the panic for about 9 seconds. |
I wouldn't be surprised if there were multiple bugs producing this same panic, but the code is a bit too arcane for me to determine whether some strange thing I see in there is actually a bug. |
I've played
It would really be helpful if someone could figure out how to reproduce bug B in my system. Maybe tomorrow I can try this out in another box I have. |
Bizarre that you're no longer seeing panics in
I suppose I could try building on my other machine, but I'd expect these wasm builds to be fairly universal. What's your set up over there? I'm mainly building on macos (4k display) and running the game in Chrome. Another thing to mention is that this combination of chrome + There may be multiple bugs, but I think it's also super easy to make this panic temporarily go away by changing some code that results in the staging buffer just being oversized instead of undersized. |
I was building in a Linux laptop (HD display) and using firefox. The panics I saw in I've just tried this in another Linux box with a 4K display, and I can now consistently reproduce the problem. Perhaps this is a buffer alignment bug? I've made a few local changes to |
Okay, cool. I updated Improving the wasm build situation and allowing optional native builds is on my todo-list, but I've got sort of a delicate balance going on with the third party bevy plugins and I don't want to fall back into a cargo-feature-dependecy-heck again. I did suspect some sort of buffer alignment thing early on and spent a day working that angle but didn't get anywhere. Possibly worth another look now that I'm a bit more familiar with bevy's innards. |
I've updated my minimal test case to be slightly more crashy based on a previous observation in Taipo that adding another font seems to make things worse. I added some static text that that just says "hi" in comic neue and now it's easier to produce the panic. It now crashes both of the commits in this PR. https://github.com/rparrett/bevy-test/render-panic I've finished up the Taipo gameplay stuff I wanted to and will hopefully be digging into this again soon. |
Great news! Those changes made I think I finally figured it out. Not only were the size calculations mixed up, but there is a Here's a sorted side by side comparison of buffer items at
Some items were |
Turns out data race isn't exactly the right description. It was more of a size computation over an incomplete set. New attempt, this time successfully fixing it in all my machines (HD and 4K). @rparrett could you please try this one? EDIT: also updated the PR description/explanation above. |
So this changeset now prevents crashes in
I've also done some very quick/non-comprehensive testing of bevy examples and I don't think anything's broken. But I think the bar may still be somewhat high for proving that these changes are good while Cart is still occupied with The I wonder if a similar change to Maybe we can convince @mockersf to come around for third-opinion. |
From what I understood from #1208 that code is bound for a rewrite sooner rather than later. So the bar may actually be pretty low right now.
This PR should change nothing in terms of buffer size, the added penalty is in terms of a few unneeded
I don't think it matters, because I don't know what a normal use-case looks like, but for instance in |
I'm not that confident in what this code is doing exactly, but this PR change is very small, and only potential issue I see is that we are preparing buffers for assets that didn't change. This would probably send empty (but ignored) buffers to the gpu? ideally, we should only do that for all assets if the buffer arrays have been resized, here something like if resized {
// keep track if we added something new to the buffer arrays
let mut unchanged_added = false;
for (asset_handle, asset) in assets.iter() {
if !changed_assets.contains_key(&asset_handle) {
uniform_buffer_arrays.prepare_uniform_buffers(asset_handle, asset);
unchanged_added = true;
}
}
if unchanged_added {
// Something was added,
uniform_buffer_arrays.resize_buffer_arrays(render_resource_context);
}
uniform_buffer_arrays.set_required_staging_buffer_size_to_max();
} after testing, calling if resized {
for (asset_handle, asset) in assets.iter() {
if !changed_assets.contains_key(&asset_handle) {
uniform_buffer_arrays.prepare_uniform_buffers(asset_handle, asset);
}
}
uniform_buffer_arrays.set_required_staging_buffer_size_to_max();
} This create smaller buffers while still removing all crashes for me It depends on what is costlier:
that... I don't know. first probably costs more cpu when resizing is needed, second maybe more memory and more cpu (but less than the resize case for other possibility). The other change in this PR (https://github.com/bevyengine/bevy/pull/1509/files#diff-454d965b87cfdc4fb81e3da21b073fe7545caa90469aa65fab5ab221db064025L219) make complete sense given the name of the method |
This seems pretty difficult to benchmark, but just from looking at framerates in my own game after disabling vsync, I can't really see any difference through the noise between the current state of this PR and I think this is probably ready for cart to weigh in when he has a moment. |
Each time Shortly after that,
The first part of this PR just does (1) and (2) for all assets, rather than for changed ones. (1) should normally be a no-op, so (2) keeps the staging buffer large enough to fit all assets. Otherwise the staging buffer may be too small for a full transfer, which will happen There should be no effect on what is sent to the GPU, only in the sizing of the staging buffer. The bindings and writes are done later.
The end result should be about the same, but I agree. I'll update the PR.
I don't think there will be any measurable performance difference here. What may end up mattering more is how often the staging buffer is resized (and every single asset copied over to the GPU). Allocating a larger buffer means less allocations and less full copies. But resizes should be rather rare anyway. |
52cdba3
to
69c1bb1
Compare
* when doing a full asset copy (resize), prepare_uniform_buffers() is now called on all assets rather than just changed ones. This makes sure the staging buffer is large enough. * set_required_staging_buffer_size_to_max() now doesn't overwrite the value computed by prepare_uniform_buffers() if the resulting size would be smaller. Co-authored-by: Renato Caldas <renato@calgera.com> Co-authored-by: François <mockersf@gmail.com>
Nice work here. This makes sense to me as a workable stop-gap solution. |
bors r+ |
Fix staging buffer required size calculation (fixes #1056) The `required_staging_buffer_size` is currently calculated differently in two places, each will be correct in different situations: * `prepare_staging_buffers()` based on actual `buffer_byte_len()` * `set_required_staging_buffer_size_to_max()` based on item_size In the case of render assets, `prepare_staging_buffers()` would only operate over changed assets. If some of the assets didn't change, their size wouldn't be taken into account for the `required_staging_buffer_size`. In some cases, this meant the buffers wouldn't be resized when they should. Now `prepare_staging_buffers()` is called over all assets, which may hit performance but at least gets the size right. Shortly after `prepare_staging_buffers()`, `set_required_staging_buffer_size_to_max()` would unconditionally overwrite the previously computed value, even if using `item_size` made no sense. Now it only overwrites the value if bigger. This can be considered a short term hack, but should prevent a few hard to debug panics.
Thanks everyone! I am so very excited to un-fork bevy! |
Pull request successfully merged into main. Build succeeded: |
Fix staging buffer required size calculation (fixes #1056)
The
required_staging_buffer_size
is currently calculated differently in two places, each will be correct in different situations:prepare_staging_buffers()
based on actualbuffer_byte_len()
set_required_staging_buffer_size_to_max()
based on item_sizeIn the case of render assets,
prepare_staging_buffers()
would only operate over changed assets. If some of the assets didn't change, their size wouldn't be taken into account for therequired_staging_buffer_size
. In some cases, this meant the buffers wouldn't be resized when they should. Nowprepare_staging_buffers()
is called over all assets, which may hit performance but at least gets the size right.Shortly after
prepare_staging_buffers()
,set_required_staging_buffer_size_to_max()
would unconditionally overwrite the previously computed value, even if usingitem_size
made no sense. Now it only overwrites the value if bigger.This can be considered a short term hack, but should prevent a few hard to debug panics.