Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: wgpu 23 adds extra dispatch on Vulkan for indirect buffer copy, breaks binding reuse #6567

Open
djeedai opened this issue Nov 18, 2024 · 8 comments
Labels
area: performance How fast things go area: validation Issues related to validation, diagnostics, and error handling

Comments

@djeedai
Copy link

djeedai commented Nov 18, 2024

Description

On Vulkan (win11) with wgpu 23, every single of my indirect dispatch is now preceded by a vkCmdDispatch that I never dispatched myself. The dispatch seems to copy the indirect args from my buffer into a wgpu-owned buffer. The dispatch occurs just-in-time, after all bindings have been set for my own indirect dispatch, which means I get:

  1. set_pipeline (my pipeline)
  2. bunch of set_bind_group (my bind groups)
  3. set_pipeline (wgpu)
  4. bunch of set_bind_group (wgpu)
  5. vkCmdDispatch
  6. same as 1.
  7. same as 2.
  8. vkCmdIndirectDispatch

Step 3 to 5 didn't exist before, and not only look useless (why does wgpu copy my indirect params, I already took care of maintaining a buffer for them, it's not to have that work ignored) but also adds a lot of unnecessary state changes (steps 1. and 2., overwritten by 3. and 4.) which are slow.

Repro steps

git clone https://github.com/djeedai/bevy_hanabi.git -b u/bevy15
cargo r --example firework --no-default-features --features="bevy/bevy_winit bevy/bevy_pbr 3d "

Expected vs observed behavior

Same as previously, no step 3. and 4. above, which do not correspond to any API call made from Rust.

Extra materials

RenderDoc capture:
wgpu-extra-dispatch-vulkan-win11.zip

The follow results from a single indirect dispatch (I never called dispatch_workgroup() (non-indirect)):
image

Pipeline state of the injected (non-indirect) dispatch, showing it copies from my buffer to a wgpu one:
image

Platform

wgpu 23, win11, bevy 0.15-rc.3

@ErichDonGubler
Copy link
Member

@teoxoy: Isn't this a dupe of our intent to allow configuration to avoid the injected indirect dispatch pass? I'm not sure what issue that is, though.

@cwfitzgerald
Copy link
Member

Yeah we want to have this optional somehow.

Step 3 to 5 didn't exist before, and not only look useless (why does wgpu copy my indirect params, I already took care of maintaining a buffer for them, it's not to have that work ignored).

For this part, we are doing it to validate that your indirect args are within the limits of the adapter. This is an unfortunate requirement for running untrusted programs on the web, which is why we want a way to disable it on native.

The unnecessary state changes are a bug in the ordering of the validation code.

@djeedai
Copy link
Author

djeedai commented Nov 23, 2024

This is an unfortunate requirement for running untrusted programs on the web

Are you saying that wgpu needs this to work on wasm? Or that some app using wgpu (Firefox?) requires this, and is imposing its own specific requirements to the library and all its other users? Because I've been running wasm example apps without this change, even with bugs dispatching 3 million workgroups (oops...), and never ever crashed a browser let alone a GPU device.

@Wumpf
Copy link
Member

Wumpf commented Nov 23, 2024

Are you saying that wgpu needs this to work on wasm?

The validation injected there is necessary for Browser vendors to implement in order to become WebGPU compliant: See queue timeline validation steps here https://www.w3.org/TR/webgpu/#dom-gpucomputepassencoder-dispatchworkgroupsindirect
So to be more precise: wgpu compiling to wasm doesn't contain that code, it directs directly towards the WebGPU javascript api (ignoring the WebGL fallback). But that WebGPU implementation inside the Browser is forced to do something like that. In Firefox that implementation is wgpu-core, i.e. same thing that runs when you build wgpu for a native application.

The fact that you never saw an issue there before either means that the validation still passed or the browser you ran with didn't validate that yet 🤷

In any case as cwfitzgerald says, this is a typical validation that we'd like to make optional for native apps that typically don't need all the validation steps required for a WebGPU compliant implementation.

@Wumpf Wumpf added area: validation Issues related to validation, diagnostics, and error handling area: performance How fast things go labels Nov 23, 2024
@djeedai
Copy link
Author

djeedai commented Dec 7, 2024

There seem to be another issue with this new feature: the extra injected pass is binding 64 bytes of my dispatch buffer, but the storage buffer alignment is 32 bytes on my GPU, and my dispatch struct is also 32 bytes long. And to the best of my understanding, the indirect dispatch struct needs only 12 bytes (x/y/z) and doesn't require more than 4 bytes of alignment. In short, it's binding 2 rows of dispatch: the one it tries to validate, and the next one. It's not technically an issue I guess, but it smells like a bug. I don't understand where that 64 bytes is coming from.

image

Buffer content:

image

@teoxoy
Copy link
Member

teoxoy commented Dec 9, 2024

It's most likely because min_storage_buffer_offset_alignment on your GPU is 64 and so it has to bind more of the buffer to access the data. The shader uses a push constant to further offset reading from the buffer.

@djeedai
Copy link
Author

djeedai commented Dec 9, 2024

min_storage_buffer_offset_alignment is 32 on my GPU

@teoxoy
Copy link
Member

teoxoy commented Dec 9, 2024

Right, it is expected though, it comes from here:

let binding_size = 2 * alignment + (buffer_size % alignment);
binding_size.min(buffer_size)

The comment above the code tries to explain why, it's quite dense and I don't remember the details off the top of my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance How fast things go area: validation Issues related to validation, diagnostics, and error handling
Projects
Status: Todo
Development

No branches or pull requests

5 participants