Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing EARLY_FRAGMENT_TESTS_BIT barrier flags #81059

Conversation

darksylinc
Copy link
Contributor

These were found as part of investigating #61415 however there are more bugs yet being analyzed.

This is a low hanging fruit.

Whenever LATE_FRAGMENT_TESTS_BIT, we should almost always ask for EARLY_FRAGMENT_TESTS_BIT too.

@darksylinc darksylinc requested a review from a team as a code owner August 27, 2023 22:40
@akien-mga akien-mga added this to the 4.2 milestone Aug 28, 2023
Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the need for this change. My understanding is limited and comes from a comment from https://themaister.net/blog/2019/08/14/yet-another-blog-explaining-vulkan-synchronization/

HELPFUL TIP ON FRAGMENT TEST STAGES

It’s somewhat confusing to have two stages which basically do the same thing. When you’re waiting for a depth map to have been rendered in an earlier render pass, you should use srcStageMask = LATE_FRAGMENT_TESTS_BIT, as that will wait for the storeOp to finish its work.

When blocking a render pass with dstStageMask, just use a mask of EARLY_FRAGMENT_TESTS | LATE_FRAGMENT_TESTS.

Accordingly, I understand that The early and late fragment tests should instead be placed on the barrier_flags (which maps onto dstStageMask). Right now it looks like both are completely missing

@akien-mga akien-mga changed the title Fix missing EARLY_FRAGMENT_TESTS_BIT barrier flags Fix missing EARLY_FRAGMENT_TESTS_BIT barrier flags Aug 28, 2023
@darksylinc
Copy link
Contributor Author

Mmm... I reviewed your comments and I suspect you're right in practice.

I agree with TheMaister's comment that this is thoroughly confusing.

The problem is that we have contradicting language in the Vulkan spec:

  1. The general rule is that every stage can potentially run in parallel. Thus if we depend on any stage, the barriers should specify all the stages we depend on.
  2. As a specific rule, the Graphics pipeline has a sequential order to it. Particularly, the first EARLY_FRAGMENT test must execute prior to any Pixel Shader, and the last LATE_FRAGMENT test happens after all EARLY_FRAGMENT & Pixel Shaders have run.

Thus if we depend on LATE_FRAGMENT, we can be certain all EARLY_FRAGMENT have already executed.

But, theoretically, if the GPU never ran any LATE_FRAGMENT and we asked to depend on it but not EARLY_FRAGMENT, does that mean our next job will not wait correctly? The theoretical answer is yes, it's a data hazard. The practical answer is no, no driver will be this picky (and it would probably break existing apps, a driver that sticks too much on what's allowed by undefined behavior often breaks working apps).

Normally, you cannot do this reasoning with the other stages: If we depend on both vertex & pixel shaders; and select to depend only on pixel shaders, and we never ran any PS (e.g. depth-only rendering), the driver may indeed start the next work without waiting for the VS. The problem is that both EARLY & LATE_FRAGMENT stages refer to the same physical resources and whether LATE_FRAGMENT will be needed cannot be known in advance.

From my perspective though, my approach was far more practical: I was drowned in mistakes that were causing wrong rendering on #61415 (specifically related to depth buffer); I saw this one looked wrong and submitted a PR while I focus on all the other problems, one fewer problem on my plate to worry about.

Personally I prefer to err on the side of correctness here. Specially because this issue will come up again in the future: Bug appears related to depth rendering, "mmmm this looks theoretically wrong", PR gets submitted.

Anyway, from what we talked in chat in order to fix #81060, it would seem the best way moving forward is to implement the barrier solver ASAP.

There is a performance concern to address: waiting on EARLY_FR. means we implicitly depend on Pixel Shaders, while depending on LATE_FR. does not.

However that is a weird reasoning: if the next job starts before PS are done, then almost certainly LATE_FR. tests have not finished and the next job cannot start. If the driver doesn't know in advance if it will be using EARLY/LATE_FR. tests, then it must assume PS are a dependency as well.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reasoning makes sense! Let's go ahead with this change.

Note for maintainers. Let's wait a bit before cherrypicking this as we gather more information

@akien-mga akien-mga merged commit 9be010c into godotengine:master Aug 29, 2023
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants