Unify WMMA and FPU operator typevars [NFC] #122
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The WMMA operator only had a T typevar for the accumulator type, while the FPU operator had DT for the destination type and CT for the compute type. Unify that by adding both compute type (CT) and accumulator type (AT) typevars that indicate the type that should be used for the register-level storage and operations.
Note that the WMMA operator's typevars are actually not useful, and should match the eltype of the shared memory (as we use WMMA intrinsics to load/store shared memory, so cannot convert between shared memory and registers). However, we need the accumulator typevar as it cannot be inferred from arguments at some points, so I decided to add the compute typevar too for alignment with the FPU operator.