Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOCTOU race condition on minImageExtent/maxImageExtent? #1144

Open
haasn opened this issue Dec 23, 2019 · 13 comments
Open

TOCTOU race condition on minImageExtent/maxImageExtent? #1144

haasn opened this issue Dec 23, 2019 · 13 comments
Assignees

Comments

@haasn
Copy link

haasn commented Dec 23, 2019

Some platforms, like Xcb/Xlib/Win32, require that minImageExtent == maxImageExtent == currentExtent. However, this creates some problematic race conditions in practice.

The problem, specifically, is that the window's extent can change (e.g. due to the window manager) in between the calls to vkGetPhysicalDeviceSurfaceCapabilitiesKHR and vkCreateSwapchainKHR. When this happens, we might end up creating a swapchain with an imageExtent that is outside of the bounds of the permitted min/max at the time of the actual swapchain creation. As far as I can tell, this is technically undefined behavior? (In practice, one of the validation layers likes to error out on swapchain creation, so this is not just a theoretical concern - we have to work around it by suppressing errors related to swapchain creation from that validation layer).

Even ignoring split second race conditions, I'm not sure how a field like minImageExtent or maxImageExtent being "dynamic" is supposed to make sense at all. The specification for vkCreateSwapchainKHR only says:

imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface

but the fact that minImageExtent and maxImageExtent can change over time means this wording is vague. There's no clear link between any particular call to vkGetPhysicalDeviceSurfaceCapabilitiesKHR and the actual swapchain creation.

Obviously the intent was to mean "at the time of swapchain creation", but this leads back into aforementioned race condition, because there's always going to ba delay in between those two calls.

I'm also not sure what purpose this design is supposed to serve. There's nothing stopping users from making swapchains with the "wrong" size, either in practice or in theory. In fact, doing so can sometimes be beneficial. The obvious solution to me would be to remove the min == max == current assumption and just have min and max always represent the (unchanging) hardware capabilities on all platforms.

@krOoze
Copy link
Contributor

krOoze commented Dec 23, 2019

practically a dup of more general #388. Hopefully it gets prioritized though.
We also got into it at KhronosGroup/Vulkan-ValidationLayers#1340. Quick fix would possibly be to remove the VU; the driver should rely on VK_ERROR_OUT_OF_DATE_KHR given the possiblity of the "race condition" described above.

@haasn
Copy link
Author

haasn commented Dec 23, 2019

FWIW I sent a patchset to mesa that makes it return SUBOPTIMAL in the case of this "race condition". (That patchset also tried changing the min/max extent as per my recommendation, but they correctly pointed out that the current spec forbids this)

(Side note: OUT_OF_DATE seems like an excessively harsh error to return for size mismatches. Using SUBOPTIMAL instead allows applications to handle resizes much more smoothly and gracefully)

Simply removing the validation check, combined with the above patchset, allows this to be a non-issue in practice - but the theoretical problem remains (as tracked by #388).

@krOoze
Copy link
Contributor

krOoze commented Dec 23, 2019

(Side note: OUT_OF_DATE seems like an excessively harsh error to return for size mismatches. Using SUBOPTIMAL instead allows applications to handle resizes much more smoothly and gracefully)

Depends what you mean by "gracefully". There would be artifacts if the framebuffer size does not match the window. They intentionally removed that possibility in spec on some platforms, because the output is practically useless(, though theoretically it is possible to present into it, I think). The expectation of SUBOPTIMAL is that it is somewhat usable, though the details are kept to the impl (EDIT: except Android and Stadia, where there is guaranteed scale).

The idea is that you get the VK_ERROR_OUT_OF_DATE_KHR at some point and if so, restart the swapchain recreation. Apps should already have code to handle VK_ERROR_OUT_OF_DATE_KHR.

Simply removing the validation check, combined with the above patchset, allows this to be a non-issue in practice - but the theoretical problem remains

Then the spec only needs to state the obvious that the window size can change at any time in platform specific manner.

@haasn
Copy link
Author

haasn commented Dec 28, 2019

Depends what you mean by "gracefully". There would be artifacts if the framebuffer size does not match the window.

In practice, the artifacts of framebuffer size mismatch are maybe a few pixels of lost content / black borders here and there (usually on the border you're resizing from), whereas the artifacts of excessive swapchain recreation is the entire window becoming visually frozen while resizing.

It's possible to become so insistent on only rendering 'perfect' images that you end up not rendering anything at all. A few pixels of artifacts here and there are much more graceful than great swathes of unupdated, corrupted framebuffer contents that are the typical result of a failure to repaint in a timely fashion.

@krOoze
Copy link
Contributor

krOoze commented Dec 30, 2019

Black frame can actually be better than a mangled image in an unspecified way...

@gfxstrand
Copy link
Contributor

imageExtent must be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface

but the fact that minImageExtent and maxImageExtent can change over time means this wording is vague. There's no clear link between any particular call to vkGetPhysicalDeviceSurfaceCapabilitiesKHR and the actual swapchain creation.

Obviously the intent was to mean "at the time of swapchain creation", but this leads back into aforementioned race condition, because there's always going to ba delay in between those two calls.

Yes, as-stated that's racy and impossible to actually satisfy reliably in the middle of a resize. I believe the way this should have been spec'd and the way you should have interpreted it is something more like this:

imageExtent should be between minImageExtent and maxImageExtent, inclusive, where minImageExtent and maxImageExtent are members of the VkSurfaceCapabilitiesKHR structure returned by vkGetPhysicalDeviceSurfaceCapabilitiesKHR for the surface. If imageExtent is not between minImageExtent and maxImageExtent at any time, the swapchain will return VK_ERROR_OUT_OF_DATE

That may not be quite precise enough for spec text but I think it's pretty close and carries the intent.

@ShabbyX
Copy link
Contributor

ShabbyX commented Dec 9, 2022

I'm also not sure what purpose this design is supposed to serve. There's nothing stopping users from making swapchains with the "wrong" size, either in practice or in theory. In fact, doing so can sometimes be beneficial. The obvious solution to me would be to remove the min == max == current assumption and just have min and max always represent the (unchanging) hardware capabilities on all platforms.

The new VK_EXT_surface_maintenance1 and VK_EXT_swapchain_maintenance1 extensions address this issue. Please see this blog post for a summary.

However, leaving this open due to this remaining issue:

but the fact that minImageExtent and maxImageExtent can change over time means this wording is vague. There's no clear link between any particular call to vkGetPhysicalDeviceSurfaceCapabilitiesKHR and the actual swapchain creation.

It would still be good to have vkCreateSwapchainKHR return OUT_OF_DATE right away if this happens.

@krOoze
Copy link
Contributor

krOoze commented Dec 9, 2022

@ShabbyX Can you elaborate what has changed here? The spec still explicitly says that all the extents must equal window size on win32 and Xlib\XCB platforms.

@ShabbyX
Copy link
Contributor

ShabbyX commented Dec 10, 2022

The spec says:

With Xcb, minImageExtent, maxImageExtent, and currentExtent must always equal the window size.

And in the note below says:

Due to above restrictions, unless VkSwapchainPresentScalingCreateInfoEXT is used to specify handling of disparities between surface and swapchain dimensions, it is only possible to create a new swapchain on this platform with imageExtent being equal to the current size of the window, as reported in VkSurfaceCapabilitiesKHR::currentExtent.

You can then take a look at the VkSwapchainPresentScalingCreateInfoEXT struct which says:

When an application presents a swapchain image with dimensions different than those of the target surface, different behavior is possible on different platforms per their respective specifications:

...

(note: there's currently a typo in the spec where some places say presentScaling instead of scalingBehavior, that'll be fixed in the next spec release)

@krOoze
Copy link
Contributor

krOoze commented Dec 10, 2022

@ShabbyX Notes are only informative, so that would imply nothing has changed. If the extents are required to always be equal to window's size, then that implies they are unable to differ from the window size.

@gfxstrand
Copy link
Contributor

Yes, that's the way window systems work. You have to create your swapchain the same size as the window. On some platforms like android, the window-system auto-scales but on most platforms you just get garbage or clipping or something. There's nothing that Khronos can do about that by writing spec text nor should the window systems necessarily change either.

The only real problem here is that for a while it was in the VUs which technically means that the driver can segfault on you or something like that. That's obviously too harsh for this particular case. Worst case, you should get OUT_OF_DATE when you try to present on it.

Yes, that means that you have to keep re-creating the swapchain until things settle. Again, this is nothing new and it's not something that we can fix. This is the way window systems have worked for decades. It's just that, with Vulkan, this loop is exposed to the client because it is manually managing swapchains instead of the driver re-creating them under the hood as-needed.

It would still be good to have vkCreateSwapchainKHR return OUT_OF_DATE right away if this happens.

I'm not convinced we want to do that. It'll only mean additional round-trips to the compositor to check the size at swapchain creation time and still doesn't fix the race because the window can still resize between swapchain creation and the first present.

@cubanismo
Copy link

Notes are only informative, so that would imply nothing has changed. If the extents are required to always be equal to window's size, then that implies they are unable to differ from the window size.

The notes just guide you towards the "solution" of VkSwapchainPresentScalingCreateInfoEXT. You have to go read up on that structure to see how it can be used to avoid the need to recreate your swapchain if you're happy with some well-defined scaling/centering behavior instead, assuming the driver + platform you're running on supports such scaling. The spec doesn't currently require support for scaling.

Other than that, as @gfxstrand says, there's nothing magic drivers can or should do here. Handling window resizes well is hard, and the driver can only do a worse job of it if you try to hide the handling in there compared to what a well-written application can do, at least if they're using the other mechanisms from [swapchain/surface]_maintenance1 aimed at making swapchain recreation a better-defined and lower-overhead workflow.

It would still be good to have vkCreateSwapchainKHR return OUT_OF_DATE right away if this happens.

I'm not convinced we want to do that. It'll only mean additional round-trips to the compositor to check the size at swapchain creation time and still doesn't fix the race because the window can still resize between swapchain creation and the first present.

I've waffled on this quite a bit, but I do come down in favor of allowing vkCreateSwapchainKHR to return this error code now. There are many cases where we already know the sizes mismatch without a round trip, and we have to just create an empty shell of a swapchain and wait until someone tries to acquire from it to return an error. This also provides a sensible error code to replace the need for the impossible VUs saying various properties must match a moving target, like window sizes, fullscreen/non-fullscreen-only present modes, etc.

@krOoze
Copy link
Contributor

krOoze commented Dec 11, 2022

The only real problem here is that for a while it was in the VUs which technically means that the driver can segfault on you or something like that.

Oh, ok, so the VU was changed. And I see there's a separate set of min and max caps.

To make this distinction clear, perhaps the minScaledImageExtent and maxScaledImageExtent should be mentioned at the same places as minImageExtent. I would find that much clearer than the wordy note. E.g.:

With Win32, minImageExtent, maxImageExtent, and currentExtent must always equal the window size. But minScaledImageExtent and maxScaledImageExtent may differ from windows size on this platform if supportedPresentScaling is supported.

PS: Are these new values minScaledImageExtent and maxScaledImageExtent invariants, or is there a analogous TOCTOU problem created there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants