-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance Vulkan PSO caching #80296
Enhance Vulkan PSO caching #80296
Conversation
Hi Pedro!
This goes against good Vulkan practices. See Arseny Kapoulkine's Creating a Robust Pipeline Cache with Vulkan and Robust pipeline cache serialization. I'd recommend going against this direction (in fact Godot should do the opposite, perform a full checksum validation of the blob).
You added this flag, but then call While |
I wasn't aware of the drivers not being reliable in how they manage the cache. I'll restore the relevant parts of the code. Regarding the external sync flag, the spec says:
So, unless it's failing to mention that the function that gets the contents of the cache needs sync as well, it should be fine. It's as if the implementation was still doing some sort of sync even without that flag, but less exhaustive. However, I may be misunderstanding or it may be another real-world issue that diverges from the spec. What do you think? |
Hi!
There are two difficult things in computer science: caches and naming things. 🤣 Vulkan Caches are one of the ugliest part of the API, like everything that has caches.
Externally synchronized means you guarantee you won't be doing write-access from a thread while another thread is accessing (read or write) to the object. The example I gave used
You have Thread A requesting to read from pipelineCache, but Thread B wants to write to pipelineCache. If you request externally synchronized flag, you're telling Vulkan the driver should not worry about this scenario. Therefore while the driver is, internally inside vkCreateGraphicsPipelines, it will modify pipelineCache while in the other thread the driver is, internally inside vkGetPipelineCacheData, copying the cache to the array you submitted. This is a race condition. This is assuming both threads are using the same VkPipelineCache handle. It's ok to use different handles. If you were to use external sync, you'd have to place a mutex in both threads (and anything else that calls a Vulkan function that requests a VkPipelineCache as parameter). This is why
Update: In case you're wondering where all this comes from, this is (slightly cryptically) explained in 3.6. Threading Behavior |
7980168
to
0b23e9a
Compare
Thanks for taking the time to explain it. I've re-read that section and I found the missing piece for my mental model (bolds mine):
The thing is that I was expecting to see In this particular case (with In conclusion, I've updated the PR keeping the flag but locking the class mutex around the use of PS: I've renamed from "Overhaul" to "Enhance", given that is is ended up being humbler than initially expected. 😁 |
bfdbd74
to
f3158d7
Compare
Hi! Looks better. Just a two remarks I noticed after I wrote my comment. Remark 1I said two commands that have read access should not need sync. However the Vulkan spec doesn't seem to make such guarantee (it only talks about "access", it never talks about "read" vs "write" access). So the VK spec doesn't really guarantee Remark 2I'm new to Godot, however I see that there are macros called Personally I prefer the following idiom (using unnamed scopes) but it doesn't seem to be common in Godot's codebase: void function()
{
int a = 0;
{
_THREAD_SAFE_METHOD_
vkGetPipelineCacheData( ... );
}
} One last comment about PSO cachesPSO caches are often used and abused a lot. But knowing who demanded it to be implemented tells a lot of what the caches do or how they should behave (i.e. the more we stray outside of the original intended use case, the more resistance we may encounter): When gamedevs requested PSO caches, the primary use case everyone involved had in mind was the following: PS4 ports (Vulkan is based on Mantle, and Mantle is inspired on GNM, PS4's API). For example if you try "Detroit: Become Human", on first launch you'll be met with a ~30-minute shader cache compilation stage that uses all of your CPU cores. What is happening is the following:
Of course if we look at Godot's use case (or any other popular engine like Unity or UE4/5), our use case looks nothing like that. In fact we'd be happy if all threads could share write access on the same cache with semi-frequent reads (not so "read only") Since then drivers have worked to optimized for the torture cases that normal engines have been doing in the real world; however it's always good to know what is "the path of least resistance" is supposed to look like. What this meansAny decent Vulkan caching implementation would use a two-stage cache:
If this implementation is used, then NOT using external sync may be faster as long as the cache is read-only most of the time, because that's the optimal path. |
69a7e4e
to
ce23a24
Compare
I updated my comment with a "What this means" section @RandomShaper |
I had to use the hacky way because that's now an static method.
If they don't state any sync requirement for that function, aren't they implicitly guaranteeing concurrent calls are safe (assuming no other functions possibly modifying the cache can run without proper sync)? |
The thing is that everything is governed by the section And once we request EXTERNALLY_SYNC flag, it is governed by this paragraph:
Because the documentation doesn't say that |
Or you could request clarification to Khronos so that the spec is amended so that this detail is guaranteed. |
|
ce23a24
to
509a395
Compare
Updated, honoring the most recent feedback. |
509a395
to
d788707
Compare
d788707
to
bcf33cf
Compare
CC @darksylinc if you can do another review pass. This should solve intermittent crashes in CI tests which are quite annoying. |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too. I trust darksylinc's review and RandomShaper's testing. So let's go ahead and merge this when convenient.
Thanks! |
PR godotengine#80296 introduced a regression because it checks if the VK_EXT_pipeline_creation_cache_control extension has been enabled before using it, but turns out the process is a bit more convoluted than that (a Vulkan driver may support the extension but then say the feature is not supported)
PR godotengine#80296 introduced a regression because it checks if the VK_EXT_pipeline_creation_cache_control extension has been enabled before using it, but turns out the process is a bit more convoluted than that (a Vulkan driver may support the extension but then say the feature is not supported)
My goal was just to address any potential threading issue, but I couldn't but overhaul the whole thing to address various other issues. Highlights:
Godot doesn't need to manage its own pipeline cache header. The Vulkan implementation is expected to do so, which includes considering a cache out of date after a driver update, for instance.UPDATE 2: This is not as true as I though. See the discussion below.vkMergePipelineCaches
would likely have to be involved). Therefore, I've taken the simple path of keeping both separate. UPDATE (2023-08-30): The file name now also includes the device name so each one has its own cache, preventing invalidation when switching GPUs.VK_PIPELINE_CACHE_CREATE_EXTERNALLY_SYNCHRONIZED_BIT
and use ofLocalVector
instead ofVector
.NOTE: I'm marking this as Needs testing because at the moment of writing this I'm not sure yet this fixes the bug referenced below (I still don't know what part of the current implementation is causing it).
UPDATE: Maybe the cause was the load of the cache at https://github.com/godotengine/godot/pull/80296/files#diff-0d547c4e6687e653e54df1623d33ab291396cca8c6265101c211a028e610b1e9L9138 while the writing task may still be running.
UPDATE 2: It wasn't.
Fixes #78749.
UPDATE (2023-08-30): Fixes #81150.