[QST] uint4b / int4
Mixed Type GEMM using Cutlass 3.x
for Ampere
#1389
Labels
uint4b / int4
Mixed Type GEMM using Cutlass 3.x
for Ampere
#1389
What is your question?
Attempting to implement something similar to this Hopper example using
Cutlass 3.x / CuTe
forAmpere
.What changes need to be made to the mainloop regarding
shapes
andstrides
when copying fromgmem -> smem
and then partitioning for loading fromsmem -> registers
usingtiled_mma
, specifically when dealing withsubbyte
types? Sincesubbyte
types (u4 / s4
) are stored in units ofuint8
, when addressing into a tensor of such type, defining a layout with Shape<M, K> and Stride<K, 1> will access every other element and row when logically indexed astensor(i, j)
.I've read PR #1190 and understand the need for shuffling to achieve the conformant tensor-core layout. However, when reviewing the Hopper implementation of mixed input gemm, it seems:
tiled_mma
is directly slicing the quantized typesmem -> register
copy, per the usual mainloop pattern, however using aDefaultCopy Atom
?quant
tomma
type once in registersmma
with no shuffling needed.
Can this same pattern be applied in
Ampere
mainloop? What changes specifically regarding smemlayout, smem copy atoms,tiled_mma
for sub-byte types (u4 / s4
xfp16
mma) are needed?The text was updated successfully, but these errors were encountered: