Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] uint4b / int4 Mixed Type GEMM using Cutlass 3.x for Ampere #1389

Closed
jeromeku opened this issue Mar 8, 2024 · 0 comments
Closed

[QST] uint4b / int4 Mixed Type GEMM using Cutlass 3.x for Ampere #1389

jeromeku opened this issue Mar 8, 2024 · 0 comments

Comments

@jeromeku
Copy link
Contributor

jeromeku commented Mar 8, 2024

What is your question?
Attempting to implement something similar to this Hopper example using Cutlass 3.x / CuTe for Ampere.

What changes need to be made to the mainloop regarding shapes and strides when copying from gmem -> smem and then partitioning for loading from smem -> registers using tiled_mma, specifically when dealing with subbyte types? Since subbyte types (u4 / s4) are stored in units of uint8, when addressing into a tensor of such type, defining a layout with Shape<M, K> and Stride<K, 1> will access every other element and row when logically indexed as tensor(i, j).

I've read PR #1190 and understand the need for shuffling to achieve the conformant tensor-core layout. However, when reviewing the Hopper implementation of mixed input gemm, it seems:

Can this same pattern be applied in Ampere mainloop? What changes specifically regarding smemlayout, smem copy atoms, tiled_mma for sub-byte types (u4 / s4 x fp16 mma) are needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant