-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLSL/MSL: Support VK_KHR_zero_initialize_workgroup_memory #2444
base: main
Are you sure you want to change the base?
Conversation
// In the latter case, we cannot take the value as-is, as it can be changed to anything. | ||
// Rather, we assume it to be *one* for the sake of initializer. | ||
bool is_literal_array_size = constant_type.array_size_literal.back(); | ||
uint32_t count = is_literal_array_size ? constant_type.array.back() : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work only because we kinda throw away the array size of an OpConstantNull later when initializing the threadgroup variable anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the case for shared (threadgroup) variable, but not for other kinds. In those cases, the initializer list is still used. In fact, I'm not 100% sure if the handling is correct for those cases where initializer is used (although I hope it is if the languages act like C/C++). At least Apple's MSL compiler does not complain about it. For GLSL, I'm not an expert to say if it is right. We could again use GL_EXT_null_initializer
to eliminate doubt but it seems to require some extra work I don't quite know how. I tried to set the array size to 0, but it seems to turn the array into a scalar value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered the logic this part is a little problematic for glsl. I'm adding a test and trying to resolve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Biggest concern is lack of test.
…antNull The usage appears in Vulkan CTS related to extension VK_KHR_zero_initialize_workgroup_memory.
The extension allows initializing workgroup memory with OpConstantNull, and is a part of Vulkan 1.3. In GLSL, the support is realized via GL_EXT_null_initializer. While in MSL, we have to generate code snippets to let the threads initialize the memory.
234ede4
to
4387ba7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. Mostly nits.
res += join(" = ", to_initializer_expression(variable)); | ||
else if (options.force_zero_initialized_variables && type_can_zero_initialize(type)) | ||
res += join(" = ", to_zero_initialized_expression(get_variable_data_type_id(variable))); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brace on next line.
|
||
// "Real" workgroup variables in compute shaders needs extra caretaking. | ||
// They need to be initialized with an extra routine as they come in arbitrary form. | ||
if (var.storage == StorageClassWorkgroup && var.initializer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop braces here.
if (is_boolean) { | ||
type.basetype = SPIRType::Boolean; | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brace on next line.
// We use short to represent bool for threadgroup variable to workaround compiler bug, | ||
// so we do a temporary fixup here. Alas. (see the type_to_glsl method) | ||
bool is_boolean = type.basetype == SPIRType::Boolean; | ||
if (is_boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop brace.
string var_stride_name = join(var_name, "_stride"); | ||
string var_ptr2_name = join(var_name, "_ptr2"); | ||
|
||
statement("threadgroup uint *", var_ptr_name, " = (threadgroup uint *)&", var_name, ";"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gross, but it's certainly not worse than the alternative ...
begin_scope(); | ||
statement(join(to_name(var.self), " = ", to_initializer_expression(var), ";")); | ||
end_scope(); | ||
if (is_boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop brace.
The extension allows initializing workgroup memory with OpConstantNull, and is a part of Vulkan 1.3.
In GLSL, the support is realized via GL_EXT_null_initializer. While in MSL, we have to generate code snippets to let the threads initialize the memory. Currently I don't have a corresponding implementation for HLSL due to my lack of knowledge.
When connected with MoltenVK, the implementation can pass all Vulkan CTS compute tests enabled by the extension when an independent fix for specialization constants (KhronosGroup/MoltenVK#2434) is applied. The MoltenVK code (with downstream fix) is available at: https://github.com/dboyan/MoltenVK/tree/zero_init_wg_mem-specfix (need to manually specify SPIRV-Cross tree with this PR branch when using
fetchDependencies
while building).Vulkan CTS compute cases before enabling VK_KHR_zero_initialize_workgroup_memory:
Vulkan CTS compute cases after enabling VK_KHR_zero_initialize_workgroup_memory (with KhronosGroup/MoltenVK#2434 applied):
Closes: #2443