[QST] `uint4b / int4` Mixed Type GEMM using `Cutlass 3.x` for `Ampere` #1389

jeromeku · 2024-03-08T16:57:52Z

What is your question?
Attempting to implement something similar to this Hopper example using Cutlass 3.x / CuTe for Ampere.

What changes need to be made to the mainloop regarding shapes and strides when copying from gmem -> smem and then partitioning for loading from smem -> registers using tiled_mma, specifically when dealing with subbyte types? Since subbyte types (u4 / s4) are stored in units of uint8, when addressing into a tensor of such type, defining a layout with Shape<M, K> and Stride<K, 1> will access every other element and row when logically indexed as tensor(i, j).

I've read PR #1190 and understand the need for shuffling to achieve the conformant tensor-core layout. However, when reviewing the Hopper implementation of mixed input gemm, it seems:

tiled_mma is directly slicing the quantized type
retiling for smem -> register copy, per the usual mainloop pattern, however using a DefaultCopy Atom?
converting from quant to mma type once in registers
then performing mma
with no shuffling needed.

Can this same pattern be applied in Ampere mainloop? What changes specifically regarding smemlayout, smem copy atoms, tiled_mma for sub-byte types (u4 / s4 x fp16 mma) are needed?

The text was updated successfully, but these errors were encountered:

jeromeku added ? - Needs Triage question Question labels Mar 8, 2024

jeromeku closed this as completed Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] `uint4b / int4` Mixed Type GEMM using `Cutlass 3.x` for `Ampere` #1389

[QST] `uint4b / int4` Mixed Type GEMM using `Cutlass 3.x` for `Ampere` #1389

jeromeku commented Mar 8, 2024

[QST] uint4b / int4 Mixed Type GEMM using Cutlass 3.x for Ampere #1389

[QST] uint4b / int4 Mixed Type GEMM using Cutlass 3.x for Ampere #1389

Comments

jeromeku commented Mar 8, 2024

[QST] `uint4b / int4` Mixed Type GEMM using `Cutlass 3.x` for `Ampere` #1389

[QST] `uint4b / int4` Mixed Type GEMM using `Cutlass 3.x` for `Ampere` #1389