Skip to content

Commit

Permalink
Implement push-constants (#574)
Browse files Browse the repository at this point in the history
* Initial implementation of push_constants

* Initial implementation of push_constants

* Better handling of limits
Fix lint errors.

* One more lint error.

* And one more typo.

* Change limits to use hyphens
Combine the code that accesses features and limits for adapters and devices, since they are almost identical.
Add an error for unknown limit

* Forgot to uncomment some lines

* Removed a couple of more comments

* Fix typo in comment.
Minor cleanup.

* Move push_constants stuff to extras.py

* Fix flake and codegen

* Fix failing test

* Linux is failing even though my Mac isn't.  I have to figure out what's wrong.  :-(

* And one last lint problem

* First pass at documentation.

* First pass at documentation.

* Undo accidental modification

* See

* Found one carryover from move to 22.1 that I forgot to include.
Undoing all typo mistakes and moving to a different push.

* Yikes.  One more _api change

* Yikes.  One more _api change

* Apply suggestions from code review

Co-authored-by: Almar Klein <almar@almarklein.org>

* Update comments.
Comment @create_and_release as requested.

* Tiny change to get tests to run again.

* Apply suggestions from code review

Co-authored-by: Almar Klein <almar@almarklein.org>

---------

Co-authored-by: Almar Klein <almar@almarklein.org>
Co-authored-by: Korijn van Golen <k.vangolen@clinicalgraphics.com>
  • Loading branch information
3 people authored Sep 17, 2024
1 parent 466af69 commit 0a243bb
Show file tree
Hide file tree
Showing 8 changed files with 538 additions and 98 deletions.
97 changes: 97 additions & 0 deletions docs/backends.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,103 @@ The wgpu_native backend provides a few extra functionalities:
:return: Device
:rtype: wgpu.GPUDevice

The wgpu_native backend provides support for push constants.
Since WebGPU does not support this feature, documentation on its use is hard to find.
A full explanation of push constants and its use in Vulkan can be found
`here <https://vkguide.dev/docs/chapter-3/push_constants/>`_.
Using push constants in WGPU closely follows the Vulkan model.

The advantage of push constants is that they are typically faster to update than uniform buffers.
Modifications to push constants are included in the command encoder; updating a uniform
buffer involves sending a separate command to the GPU.
The disadvantage of push constants is that their size limit is much smaller. The limit
is guaranteed to be at least 128 bytes, and 256 bytes is typical.

Given an adapter, first determine if it supports push constants::

>> "push-constants" in adapter.features
True

If push constants are supported, determine the maximum number of bytes that can
be allocated for push constants::

>> adapter.limits["max-push-constant-size"]
256

You must tell the adapter to create a device that supports push constants,
and you must tell it the number of bytes of push constants that you are using.
Overestimating is okay::

device = adapter.request_device(
required_features=["push-constants"],
required_limits={"max-push-constant-size": 256},
)

Creating a push constant in your shader code is similar to the way you would create
a uniform buffer.
The fields that are only used in the ``@vertex`` shader should be separated from the fields
that are only used in the ``@fragment`` shader which should be separated from the fields
used in both shaders::

struct PushConstants {
// vertex shader
vertex_transform: vec4x4f,
// fragment shader
fragment_transform: vec4x4f,
// used in both
generic_transform: vec4x4f,
}
var<push_constant> push_constants: PushConstants;

To the pipeline layout for this shader, use
``wgpu.backends.wpgu_native.create_pipeline_layout`` instead of
``device.create_pipelinelayout``. It takes an additional argument,
``push_constant_layouts``, describing
the layout of the push constants. For example, in the above example::

push_constant_layouts = [
{"visibility": ShaderState.VERTEX, "start": 0, "end": 64},
{"visibility": ShaderStage.FRAGMENT, "start": 64, "end": 128},
{"visibility": ShaderState.VERTEX + ShaderStage.FRAGMENT , "start": 128, "end": 192},
],

Finally, you set the value of the push constant by using
``wgpu.backends.wpgu_native.set_push_constants``::

set_push_constants(this_pass, ShaderStage.VERTEX, 0, 64, <64 bytes>)
set_push_constants(this_pass, ShaderStage.FRAGMENT, 64, 128, <64 bytes>)
set_push_constants(this_pass, ShaderStage.VERTEX + ShaderStage.FRAGMENT, 128, 192, <64 bytes>)

Bytes must be set separately for each of the three shader stages. If the push constant has
already been set, on the next use you only need to call ``set_push_constants`` on those
bytes you wish to change.

.. py:function:: wgpu.backends.wpgu_native.create_pipeline_layout(device, *, label="", bind_group_layouts, push_constant_layouts=[])
This method provides the same functionality as :func:`wgpu.GPUDevice.create_pipeline_layout`,
but provides an extra `push_constant_layouts` argument.
When using push constants, this argument is a list of dictionaries, where each item
in the dictionary has three fields: `visibility`, `start`, and `end`.

:param device: The device on which we are creating the pipeline layout
:param label: An optional label
:param bind_group_layouts:
:param push_constant_layouts: Described above.

.. py:function:: wgpu.backends.wgpu_native.set_push_constants(render_pass_encoder, visibility, offset, size_in_bytes, data, data_offset=0)
This function requires that the underlying GPU implement `push_constants`.
These push constants are a buffer of bytes available to the `fragment` and `vertex`
shaders. They are similar to a bound buffer, but the buffer is set using this
function call.

:param render_pass_encoder: The render pass encoder to which we are pushing constants.
:param visibility: The stages (vertex, fragment, or both) to which these constants are visible
:param offset: The offset into the push constants at which the bytes are to be written
:param size_in_bytes: The number of bytes to copy from the ata
:param data: The data to copy to the buffer
:param data_offset: The starting offset in the data at which to begin copying.


The js_webgpu backend
---------------------
Expand Down
164 changes: 164 additions & 0 deletions tests/test_set_constant.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
import numpy as np
import pytest

import wgpu.utils
from tests.testutils import can_use_wgpu_lib, run_tests
from wgpu import TextureFormat
from wgpu.backends.wgpu_native.extras import create_pipeline_layout, set_push_constants

if not can_use_wgpu_lib:
pytest.skip("Skipping tests that need the wgpu lib", allow_module_level=True)


"""
This code is an amazingly slow way of adding together two 10-element arrays of 32-bit
integers defined by push constants and store them into an output buffer.
The first number of the addition is purposely pulled using the vertex stage, and the
second number from the fragment stage, so that we can ensure that we are correctly
using stage-separated push constants correctly.
The source code assumes the topology is POINT-LIST, so that each call to vertexMain
corresponds with one call to fragmentMain.
"""
COUNT = 10

SHADER_SOURCE = (
f"""
const COUNT = {COUNT}u;
"""
"""
// Put the results here
@group(0) @binding(0) var<storage, read_write> data: array<u32, COUNT>;
struct PushConstants {
values1: array<u32, COUNT>, // VERTEX constants
values2: array<u32, COUNT>, // FRAGMENT constants
}
var<push_constant> push_constants: PushConstants;
struct VertexOutput {
@location(0) index: u32,
@location(1) value: u32,
@builtin(position) position: vec4f,
}
@vertex
fn vertexMain(
@builtin(vertex_index) index: u32,
) -> VertexOutput {
return VertexOutput(index, push_constants.values1[index], vec4f(0, 0, 0, 1));
}
@fragment
fn fragmentMain(@location(0) index: u32,
@location(1) value: u32
) -> @location(0) vec4f {
data[index] = value + push_constants.values2[index];
return vec4f();
}
"""
)

BIND_GROUP_ENTRIES = [
{"binding": 0, "visibility": "FRAGMENT", "buffer": {"type": "storage"}},
]


def setup_pipeline():
adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
device = adapter.request_device(
required_features=["push-constants"],
required_limits={"max-push-constant-size": 128},
)
output_texture = device.create_texture(
# Actual size is immaterial. Could just be 1x1
size=[128, 128],
format=TextureFormat.rgba8unorm,
usage="RENDER_ATTACHMENT|COPY_SRC",
)
shader = device.create_shader_module(code=SHADER_SOURCE)
bind_group_layout = device.create_bind_group_layout(entries=BIND_GROUP_ENTRIES)
render_pipeline_layout = create_pipeline_layout(
device,
bind_group_layouts=[bind_group_layout],
push_constant_layouts=[
{"visibility": "VERTEX", "start": 0, "end": COUNT * 4},
{"visibility": "FRAGMENT", "start": COUNT * 4, "end": COUNT * 4 * 2},
],
)
pipeline = device.create_render_pipeline(
layout=render_pipeline_layout,
vertex={
"module": shader,
"entry_point": "vertexMain",
},
fragment={
"module": shader,
"entry_point": "fragmentMain",
"targets": [{"format": output_texture.format}],
},
primitive={
"topology": "point-list",
},
)
render_pass_descriptor = {
"color_attachments": [
{
"clear_value": (0, 0, 0, 0), # only first value matters
"load_op": "clear",
"store_op": "store",
"view": output_texture.create_view(),
}
],
}

return device, pipeline, render_pass_descriptor


def test_normal_push_constants():
device, pipeline, render_pass_descriptor = setup_pipeline()
vertex_call_buffer = device.create_buffer(size=COUNT * 4, usage="STORAGE|COPY_SRC")
bind_group = device.create_bind_group(
layout=pipeline.get_bind_group_layout(0),
entries=[
{"binding": 0, "resource": {"buffer": vertex_call_buffer}},
],
)

encoder = device.create_command_encoder()
this_pass = encoder.begin_render_pass(**render_pass_descriptor)
this_pass.set_pipeline(pipeline)
this_pass.set_bind_group(0, bind_group)

buffer = np.random.randint(0, 1_000_000, size=(2 * COUNT), dtype=np.uint32)
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, buffer)
set_push_constants(this_pass, "FRAGMENT", COUNT * 4, COUNT * 4, buffer, COUNT * 4)
this_pass.draw(COUNT)
this_pass.end()
device.queue.submit([encoder.finish()])
info_view = device.queue.read_buffer(vertex_call_buffer)
result = np.frombuffer(info_view, dtype=np.uint32)
expected_result = buffer[0:COUNT] + buffer[COUNT:]
assert all(result == expected_result)


def test_bad_set_push_constants():
device, pipeline, render_pass_descriptor = setup_pipeline()
encoder = device.create_command_encoder()
this_pass = encoder.begin_render_pass(**render_pass_descriptor)

def zeros(n):
return np.zeros(n, dtype=np.uint32)

with pytest.raises(ValueError):
# Buffer is to short
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT - 1))

with pytest.raises(ValueError):
# Buffer is to short
set_push_constants(this_pass, "VERTEX", 0, COUNT * 4, zeros(COUNT + 1), 8)


if __name__ == "__main__":
run_tests(globals())
34 changes: 32 additions & 2 deletions tests/test_wgpu_native_basics.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,18 +424,48 @@ def test_features_are_legal():
)
# We can also use underscore
assert are_features_wgpu_legal(["push_constants", "vertex_writable_storage"])
# We can also use camel case
assert are_features_wgpu_legal(["PushConstants", "VertexWritableStorage"])


def test_features_are_illegal():
# not camel Case
assert not are_features_wgpu_legal(["pushConstants"])
# writable is misspelled
assert not are_features_wgpu_legal(
["multi-draw-indirect", "vertex-writeable-storage"]
)
assert not are_features_wgpu_legal(["my-made-up-feature"])


def are_limits_wgpu_legal(limits):
"""Returns true if the list of features is legal. Determining whether a specific
set of features is implemented on a particular device would make the tests fragile,
so we only verify that the names are legal feature names."""
adapter = wgpu.gpu.request_adapter(power_preference="high-performance")
try:
adapter.request_device(required_limits=limits)
return True
except RuntimeError as e:
assert "Unsupported features were requested" in str(e)
return True
except KeyError:
return False


def test_limits_are_legal():
# A standard feature. Probably exists
assert are_limits_wgpu_legal({"max-bind-groups": 8})
# Two common extension features
assert are_limits_wgpu_legal({"max-push-constant-size": 128})
# We can also use underscore
assert are_limits_wgpu_legal({"max_bind_groups": 8, "max_push_constant_size": 128})
# We can also use camel case
assert are_limits_wgpu_legal({"maxBindGroups": 8, "maxPushConstantSize": 128})


def test_limits_are_not_legal():
assert not are_limits_wgpu_legal({"max-bind-group": 8})


if __name__ == "__main__":
run_tests(globals())

Expand Down
35 changes: 34 additions & 1 deletion tests_mem/testutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,40 @@ def ob_name_from_test_func(func):


def create_and_release(create_objects_func):
"""Decorator."""
"""
This wrapper goes around a test that takes a single argument n. That test should
be a generator function that yields a descriptor followed
n different objects corresponding to the name of the test function. Hence
a test named `test_release_foo_bar` would yield a descriptor followed by
n FooBar objects.
The descriptor is a dictionary with three fields, each optional.
In a typical situation, there will be `n` FooBar object after the test, and after
releasing, there will be zero. However, sometimes there are auxiliary objects,
in which case its necessary to provide one or more fields.
The keys "expected_counts_after_create" and "expected_counts_after_release" each have
as their value a sub-dictionary giving the number of still-alive WGPU objects.
The key "expected_counts_after_create" gives the expected state after the
n objects have been created and put into a list; "expected_counts_after_release"
gives the state after the n objects have been released.
These sub-dictionaries have as their keys the names of WGPU object types, and
their value is a tuple of two integers: the first is the number of Python objects
expected to exist and the second is the number of native objects. Any type not in
the subdictionary has an implied value of (0, 0).
The key "ignore" has as its value a collection of object types that we should ignore
in this test. Ideally we should not use this, but currently there are a few cases where
we cannot reliably predict the number of objects in wgpu-native.
If the descriptor doesn't contain an "expected_counts_after_create", then the default
is {"FooBar": (n, n)}, where "FooBar" is derived from the name of the test.
If the descriptor doesn't contain an "expected_counts_after_release", then the
default is {}, indicated that creating and removing the objects should completely
clean itself up.
"""

def core_test_func():
"""The core function that does the testing."""
Expand Down
Loading

0 comments on commit 0a243bb

Please sign in to comment.