Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize usage of "prepare for use" in draw and dispatch commands. #91989

Merged
merged 1 commit into from
May 15, 2024

Conversation

DarioSamo
Copy link
Contributor

Prepare for use is a command that got added due to the D3D12 driver requiring a step to transition resources to their required states, meaning the function needed to be called no matter what.

Due to how the function was written to share the loop used for validation (which does not get compiled into release games), this means that command was added and serialized into the render graph for every draw call even if the API did not require it at all.

Upon testing Nuku Warriors I found the command being called 72K times per frame on the Vulkan backend despite being a noop in that API. Furthermore, this path will also be skipped when #91769 is merged and the driver reports support for enhanced barriers.

I've not measured the performance differential but it is likely the CPU times can only improve with this change as it goes from doing something that was unnecessary to doing nothing.

@DarioSamo DarioSamo requested a review from a team as a code owner May 15, 2024 17:33
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@clayjohn clayjohn modified the milestones: 4.x, 4.3 May 15, 2024
@akien-mga akien-mga merged commit 4bb8c06 into godotengine:master May 15, 2024
16 checks passed
@akien-mga
Copy link
Member

Thanks!

@Calinou
Copy link
Member

Calinou commented May 15, 2024

For future reference, I benchmarked this change on https://github.com/godotengine/godot-benchmarks' Bunnymark 2D benchmarks with --print-fps, which are notoriously CPU-bound on my machine:

Benchmark Before After (this PR)
rendering/bunnymark/bunnymark_canvasitem_draw_api_5000 810 FPS (1.23 mspf) 830 FPS (1.20 mspf)
rendering/bunnymark/bunnymark_canvasitem_draw_api_10000 413 FPS (2.42 mspf) 428 FPS (2.33 mspf)
rendering/bunnymark/bunnymark_canvasitem_draw_api_20000 216 FPS (4.62 mspf) 220 FPS (4.54 mspf)
rendering/bunnymark/bunnymark_meshinstance2d_5000 527 FPS (1.89 mspf) 532 FPS (1.87 mspf)
rendering/bunnymark/bunnymark_meshinstance2d_10000 264 FPS (3.78 mspf) 266 FPS (3.75 mspf)
rendering/bunnymark/bunnymark_meshinstance2d_20000 133 FPS (7.51 mspf) 134 FPS (7.46 mspf)
rendering/bunnymark/bunnymark_sprite2d_5000 566 FPS (1.76 mspf) 569 FPS (1.75 mspf)
rendering/bunnymark/bunnymark_sprite2d_10000 278 FPS (3.59 mspf) 285 FPS (3.50 mspf)
rendering/bunnymark/bunnymark_sprite2d_20000 136 FPS (7.35 mspf) 141 FPS (7.09 mspf)
PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 39)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants