Why is there a 16x8x16 TensorOp for tf32 but not a 16x16x8? #1382

RaulPPelaez · 2024-03-06T09:40:10Z

RaulPPelaez
Mar 6, 2024

I am learning about CUTLASS with the ultimate goal of accelerating a batched multiply-add of small matrix ops (like 16x16x8). Most of the terminology in this library still eludes me, so please forgive me if I am asking something obvious...

According to the CUDA docs there exists only one tensor core operation for tf32, which is 16x16x8:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#element-types-and-matrix-sizes

OTOH, in the CUTLASS docs the following TensorOps are available:

https://github.com/NVIDIA/cutlass/blob/main/media/docs/functionality.md#cutlass-2x-kernels
Where the tf32*tf32+f32 mode is also acknowledged.

The example link 404s btw.

How come there is a 16x8x16 mode but not a 16x16x8 one?

Answered by thakkarV

Mar 6, 2024

16x16x8 instruction shape is only supported by WMMA (the first table you show is documenting the shapes supported by WMMA). CUTLASS uses the PTX API (mma.sync.*) for Ampere tensor core ops, which natively supports an instruction shape of 16x8x4 or 16x8x8

View full answer

thakkarV · 2024-03-06T17:05:25Z

thakkarV
Mar 6, 2024
Collaborator

16x16x8 instruction shape is only supported by WMMA (the first table you show is documenting the shapes supported by WMMA). CUTLASS uses the PTX API (mma.sync.*) for Ampere tensor core ops, which natively supports an instruction shape of 16x8x4 or 16x8x8

1 reply

RaulPPelaez Mar 7, 2024
Author

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there a 16x8x16 TensorOp for tf32 but not a 16x16x8? #1382

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Why is there a 16x8x16 TensorOp for tf32 but not a 16x16x8? #1382

RaulPPelaez Mar 6, 2024

Replies: 1 comment · 1 reply

thakkarV Mar 6, 2024 Collaborator

RaulPPelaez Mar 7, 2024 Author

RaulPPelaez
Mar 6, 2024

Replies: 1 comment 1 reply

thakkarV
Mar 6, 2024
Collaborator

RaulPPelaez Mar 7, 2024
Author