Because of the glorious amount of test needed, the WebGPU CTS is organized as a tree of arbitrary depth (a filesystem with multiple tests per file).
Each directory may have a README.txt
describing its contents.
Tests are grouped in large families (each of which has a README.txt
);
the root and first few levels looks like the following (some nodes omitted for simplicity):
api
with tests for full coverage of the Javascript API surface of WebGPU.validation
with positive and negative tests for all the validation rules of the API.operation
with tests that checks the result of performing valid WebGPU operations, taking advantage of parametrization to exercise interactions between parts of the API.regression
for one-off tests that reproduce bugs found in implementations to prevent the bugs from appearing again.
shader
with tests for full coverage of the shaders that can be passed to WebGPU.validation
.execution
similar toapi/operation
.regression
.
idl
with tests to check that the WebGPU IDL is correctly implemented, for examples that objects exposed exactly the correct members, and that methods throw when passed incomplete dictionaries.web-platform
with tests for Web platform-specific interactions likeGPUSwapChain
and<canvas>
, WebXR andGPUQueue.copyExternalImageToTexture
.
At the same time test hierarchies can be used to split the testing of a single sub-object into
several file for maintainability. For example GPURenderPipeline
has a large descriptor and some
parts could be tested independently like vertex_input
vs. primitive_topology
vs. blending
but all live under the render_pipeline
directory.
In addition to the test tree, each test can be parameterized. For coverage it is important to
test all enums values, for example for GPUTextureFormat
. Instead of having a loop to iterate
over all the GPUTextureFormat
, it is better to parameterize the test over them. Each format
will have a different entry in the test list which will help WebGPU implementers debug the test,
or suppress the failure without losing test coverage while they fix the bug.
Extra capabilities (limits and features) are often tested in the same files as the rest of the API.
For example, a compressed texture format capability would simply add a GPUTextureFormat
to the
parametrization lists of many tests, while a capability adding significant new functionality
like ray-tracing could have a separate subtree.
Operation tests for optional features should be skipped using t.selectDeviceOrSkipTestCase()
or
t.skip()
. Validation tests should be written that test the behavior with and without the
capability enabled via t.selectDeviceOrSkipTestCase()
, to ensure the functionality is valid
only with the capability enabled.
Validation tests check the validation rules that are (or will be) set by the WebGPU spec. Validation tests try to carefully trigger the individual validation rules in the spec, without simultaneously triggering other rules.
Validation errors generally generate WebGPU errors, not exceptions. But check the spec on a case-by-case basis.
Like all GPUTest
s, ValidationTest
s are wrapped in both types of error scope. These
"catch-all" error scopes look for any errors during the test, and report them as test failures.
Since error scopes can be nested, validation tests can nest an error scope to expect that there
are errors from specific operations.
Test parameterization can help write many validation tests more succinctly, while making it easier for both authors and reviewers to be confident that an aspect of the API is tested fully. Examples:
webgpu:api,validation,render_pass,resolve:resolve_attachment:*
webgpu:api,validation,createBindGroupLayout:bindingTypeSpecific_optional_members:*
Use your own discretion when deciding the balance between heavily parameterizing a test and writing multiple separate tests.
There are many aspects that should be tested in all validation tests:
- each individual argument to a method call (including
this
) or member of a descriptor dictionary should be tested including:- what happens when an error object is passed.
- what happens when an optional feature enum or method is used.
- what happens for numeric values when they are at 0, too large, too small, etc.
- each validation rule in the specification should be checked both with a control success case, and error cases.
- each set of arguments or state that interact for validation.
When testing numeric values, it is important to check on both sides of the boundary: if the error happens for value N and not N - 1, both should be tested. Alignment of integer values should also be tested but boundary testing of alignment should be between a value aligned to 2^N and a value aligned to 2^(N-1).
Finally, this is probably also where we would test that extensions follow the rule that: if the
browser supports a feature but it is not enabled on the device, then calling methods from that
feature throws TypeError
.
- Test providing unknown properties that are definitely not part of any feature are valid/ignored. (Unfortunately, due to the rules of IDL, adding a member to a dictionary is always a breaking change. So this is how we have to test this unless we can get a "strict" dictionary type in IDL. We can't test adding members from non-enabled extensions.)
Operation tests test the actual results of using the API. They execute (sometimes significant) code and check that the result is within the expected set of behaviors (which can be quite complex to compute).
Note that operation tests need to test a lot of interactions between different parts of the API, and so can become quite complex. Try to reduce the complexity by utilizing combinatorics and helpers, and splitting/merging test files as needed.
Operation tests are usually GPUTest
s. As a result, they automatically fail on any validation
errors that occur during the test.
When it's easier to write an operation test with invalid cases, use
ParamsBuilder.filter
/.unless
to avoid invalid cases, or detect and
expect
validation errors in some cases.
Use helpers like expectContents
(and more to come) to check the values of data on the GPU.
(These are "eventual expectations" - the harness will wait for them to finish at the end).
When testing something inside a shader, it's not always necessary to output the result to a render output. In fragment shaders, you can output to a storage buffer. In vertex shaders, you can't - but you can render with points (simplest), send the result to the fragment shader, and output it from there. (Someday, we may end up wanting a helper for this.)
Default value tests (for arguments and dictionary members) should usually be operation tests -
all you have to do is include undefined
in parameterizations of other tests to make sure the
behavior with undefined
has the same expected result that you have when the default value is
specified explicitly.
TODO: figure out how to implement these. #332
These tests test only rules that come directly from WebIDL. For example:
- Values out of range for
[EnforceRange]
cause exceptions. - Required function arguments and dictionary members cause exceptions if omitted.
- Arguments and dictionary members cause exceptions if passed the wrong type.
They may also test positive cases like the following, but the behavior of these should be tested in operation tests.
- OK to omit optional arguments/members.
- OK to pass the correct argument/member type (or of any type in a union type).
Every overload of every method should be tested.
Stress tests and manual tests for WebGPU that are not intended to be run in an automated way.
Unit tests for the test framework (src/common/framework/
).
A demo of test hierarchies for the purpose of testing the standalone
test runner page.