Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

Closed

Conversation

superdump
Copy link
Contributor

@superdump superdump commented Feb 19, 2022

Objective

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

Solution

  • Add RenderDevice to the Material and SpecializedMaterial trait ::key() functions to allow setting flags on the keys depending on feature/limit availability
  • Make GpuPointLights and ViewClusterBuffers into enums containing UniformVec and StorageBuffer variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
  • Appropriate shader defs in the shader code to handle the two cases

Context on some decisions / open questions

  • I'm using max_storage_buffers_per_shader_stage >= 3 as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
  • Should ViewClusterBuffers and ViewClusterBindings be merged, duplicating the count variables into the enum variants?

@superdump
Copy link
Contributor Author

Apparently storage buffers cannot have 0 size so that case needs taking care of.

@superdump superdump self-assigned this Feb 19, 2022
@alice-i-cecile alice-i-cecile added A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Feb 20, 2022
@superdump superdump force-pushed the clustered-forward-storage-buffers branch from 9d9b13d to a2fc835 Compare February 25, 2022 15:32
@superdump superdump marked this pull request as ready for review February 25, 2022 15:33
@superdump
Copy link
Contributor Author

I'm going to rework the StorageVec a little.

@superdump superdump force-pushed the clustered-forward-storage-buffers branch from b71569c to 305a3d0 Compare March 1, 2022 23:58
@superdump
Copy link
Contributor Author

@superdump superdump force-pushed the clustered-forward-storage-buffers branch from ccf9af8 to ffad065 Compare March 24, 2022 09:54
@superdump superdump changed the title Use storage buffers for clustered forward bindings Use storage buffers for clustered forward point lights Mar 24, 2022
examples/README.md Outdated Show resolved Hide resolved
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes on small code quality issues.

@superdump superdump force-pushed the clustered-forward-storage-buffers branch from 2891f2f to 132f203 Compare March 25, 2022 11:07
@superdump superdump added this to the Bevy 0.7 milestone Mar 25, 2022
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great; I'm happy with this now. I think the few little tweaks you made strike a nice balance between avoiding over-engineering and making sure that we actually keep changes synchronized.

Obviously, I don't have in-depth feedback about the details of the rendering strategy, but I'm happy with the code quality and examples.

Copy link
Contributor

@jakobhellermann jakobhellermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell.

crates/bevy_pbr/src/render/light.rs Show resolved Hide resolved
crates/bevy_pbr/src/render/light.rs Outdated Show resolved Hide resolved
crates/bevy_render/src/render_resource/storage_buffer.rs Outdated Show resolved Hide resolved
// NOTE: This needs to run after prepare_lights. As prepare_lights is an exclusive system,
// just adding it to the non-exclusive systems in the Prepare stage means it runs after
// prepare_lights.
render::prepare_clusters.label(RenderLightSystems::PrepareClusters),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this label shouldn't be needed since #4224

Comment on lines +61 to +69
commands.spawn_bundle(PointLightBundle {
point_light: PointLight {
range: LIGHT_RADIUS,
intensity: LIGHT_INTENSITY,
..default()
},
transform: Transform::from_translation((RADIUS as f64 * unit_sphere_p).as_vec3()),
..default()
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting a random colour here makes it look prettier I think

@superdump superdump force-pushed the clustered-forward-storage-buffers branch from 33bdd28 to 9da6d5d Compare April 6, 2022 12:09
@superdump
Copy link
Contributor Author

The lighting example is failing on wasm for me on this branch (including when I rebase on main):

panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
      note: label = `pbr_opaque_mesh_pipeline`
    Internal error in VERTEX shader: ERROR: 0:41: 'data' : array members of structs must specify a size
ERROR: 0:44: 'data' : array members of structs must specify a size
ERROR: 0:47: 'data' : array members of structs must specify a size

Presumably due to this part of mesh_view_bind_group.wgsl?

struct ClusterLightIndexLists {
    data: array<u32>;
};

With the current state of this PR, I cannot reproduce this running in Chrome. I still have this irritating gamepad issue in Firefox which I have to hack out manually in the JS:

panicked at 'error getting gamepads: JsValue(TypeError: getObject(...).getGamepads is not a function
init/imports.wbg.__wbg_getGamepads_125e8440ea7a98c8/<@http://0.0.0.0:8000/target/wasm_example.js:1071:35
handleError@http://0.0.0.0:8000/target/wasm_example.js:260:18
init/imports.wbg.__wbg_getGamepads_125e8440ea7a98c8@http://0.0.0.0:8000/target/wasm_example.js:1070:74
web_sys::features::gen_Navigator::Navigator::get_gamepads::h38f15820bcd54585@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[17624]:0x9e4430
gilrs_core::platform::platform::gamepad::Gilrs::next_event::ha502708d74993467@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[659]:0x24484f
gilrs_core::Gilrs::next_event::h1def7b279c39043e@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[25101]:0xa1cd38
gilrs::gamepad::Gilrs::next_event_priv::h581e4e4191b9fd6b@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[719]:0x272637
gilrs::gamepad::Gilrs::next_event::h795f889b0d271268@http:/…
wasm_example.js:420:21

@superdump
Copy link
Contributor Author

I made an issue for the gamepad API problem here: #4431

@cart
Copy link
Member

cart commented Apr 6, 2022

With the current state of this PR, I cannot reproduce this running in Chrome. I still have this irritating gamepad issue in Firefox which I have to hack out manually in the JS:

Weeeird I just tried the latest changes and those also work for me (in firefox, which is where it was failing before). Idk if that was a "cart's system had something weird" thing or a "it used to be broken and now its not" thing.

Regardless, it works now so I guess we're good :)

@cart
Copy link
Member

cart commented Apr 6, 2022

bors r+

bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors
Copy link
Contributor

bors bot commented Apr 6, 2022

Timed out.

@cart
Copy link
Member

cart commented Apr 6, 2022

bors r+

bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors
Copy link
Contributor

bors bot commented Apr 6, 2022

This PR was included in a batch that timed out, it will be automatically retried

bors bot pushed a commit that referenced this pull request Apr 6, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors
Copy link
Contributor

bors bot commented Apr 7, 2022

Timed out.

@mockersf
Copy link
Member

mockersf commented Apr 7, 2022

bors r+

bors bot pushed a commit that referenced this pull request Apr 7, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes #3605 
- Based on top of #4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@bors bors bot changed the title Use storage buffers for clustered forward point lights [Merged by Bors] - Use storage buffers for clustered forward point lights Apr 7, 2022
@bors bors bot closed this Apr 7, 2022
bors bot pushed a commit that referenced this pull request Apr 15, 2022
# Objective

- Fixes #4234
- Fixes #4473 
- Built on top of #3989
- Improve performance of `assign_lights_to_clusters`

## Solution

- Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs
- Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf
- Adapt to also support orthographic projections
- Add `many_lights -- orthographic` for testing many lights using an orthographic projection

## Results

- `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system.

---

## Changelog

- Changed: Improved cluster light assignment performance

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
@Shatur Shatur mentioned this pull request Apr 17, 2022
aevyrie pushed a commit to aevyrie/bevy that referenced this pull request Jun 7, 2022
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes bevyengine#3605 
- Based on top of bevyengine#4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
aevyrie pushed a commit to aevyrie/bevy that referenced this pull request Jun 7, 2022
# Objective

- Fixes bevyengine#4234
- Fixes bevyengine#4473 
- Built on top of bevyengine#3989
- Improve performance of `assign_lights_to_clusters`

## Solution

- Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs
- Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf
- Adapt to also support orthographic projections
- Add `many_lights -- orthographic` for testing many lights using an orthographic projection

## Results

- `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system.

---

## Changelog

- Changed: Improved cluster light assignment performance

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
- Fixes bevyengine#3605 
- Based on top of bevyengine#4079 

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

## Solution

- Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability
- Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
- Appropriate shader defs in the shader code to handle the two cases

## Context on some decisions / open questions

- I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
- Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants?


Co-authored-by: Carter Anderson <mcanders1@gmail.com>
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
# Objective

- Fixes bevyengine#4234
- Fixes bevyengine#4473 
- Built on top of bevyengine#3989
- Improve performance of `assign_lights_to_clusters`

## Solution

- Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs
- Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf
- Adapt to also support orthographic projections
- Add `many_lights -- orthographic` for testing many lights using an orthographic projection

## Results

- `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system.

---

## Changelog

- Changed: Improved cluster light assignment performance

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times
Projects
Archived in project
Status: Done
Development

Successfully merging this pull request may close these issues.

Use storage buffers for clustered forward bindings for 'unlimited' point lights
6 participants