Skip to content

Why is there a 16x8x16 TensorOp for tf32 but not a 16x16x8? #1382

Answered by thakkarV
RaulPPelaez asked this question in Q&A
Discussion options

You must be logged in to vote

16x16x8 instruction shape is only supported by WMMA (the first table you show is documenting the shapes supported by WMMA). CUTLASS uses the PTX API (mma.sync.*) for Ampere tensor core ops, which natively supports an instruction shape of 16x8x4 or 16x8x8

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@RaulPPelaez
Comment options

Answer selected by RaulPPelaez
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants