-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backdrop_dyn] Handle upstream pipeline failure #553
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks as expected. This is among my planned changes for robust dynamic memory, so it's good to see this change made in the minimal form.
I've not ran it, but the CI smoke tests should catch this being wrong.
Has any thought been put into applying this more consistently throughout the pipeline?
@@ -34,6 +46,9 @@ fn main( | |||
sh_row_width[local_id.x] = path.bbox.z - path.bbox.x; | |||
row_count = path.bbox.w - path.bbox.y; | |||
sh_offset[local_id.x] = path.tiles; | |||
} else { | |||
// Explicitly zero the row width, just in case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't very clear to me what "just in case" here means.
I see that we're not doing a scan over sh_row_width
, and so in theory we won't ever be reading this value.
To be clear, I think this change is fine - especially once we start to use gfx-rs/wgpu#5508.
But is this actually fixing an issue, or just programming defensively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My analysis matches Daniel's. In any case, this code won't be around very long, I want to replace it with partition-wide prefix sum of the backdrop values, so being defensive seems preferable to over-optimizing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed being defensive for the case where workgroup memory initialization might be turned off. The scan in the following lines is over sh_row_count
but I think it's technically possible for the loop at the very bottom to read this sh_row_width
value if el_ix
happens to match local_id.x
?
At any rate, I think I'll leave this in and land this as is.
Following #537 it is possible for the flatten stage to fail and flag a failure. In some cases this can cause invalid / corrupt bounding box data to propagate downstream, leading to a hang in the per-tile backdrop calculation loop. Triggering this is highly subtle, so I don't have a test case as part of vello scenes that can reliably reproduce this. Regardless, it makes sense to check for the upstream failures and terminate the work in general.
7b5c0fe
to
f1db451
Compare
Following #537 it is possible for the flatten stage to fail and flag a failure. In some cases this can cause invalid / corrupt bounding box data to propagate downstream, leading to a hang in the per-tile backdrop calculation loop.
Triggering this is highly subtle, so I don't have a test case as part of vello scenes that can reliably reproduce this. Regardless, it makes sense to check for the upstream failures and terminate the work in general.
I made backdrop_dyn check for any upstream failure and I didn't make it signal its own failure flag. I also didn't change the logic in the CPU shader since the other stages I checked (flatten, coarse) do not implement error signaling in their CPU counterparts. Let me know if you'd like me to work on those.