-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature] CUTLASS kernels for w4a8 quantization #64
Comments
Working on this: NVIDIA/cutlass#1413. |
Great work so far on integrating Do you have plans on re-implementing this functionality in pre-Hopper architectures using The Would be happy to help adapt |
(Please send further comments to the PR mentioned above - I think it makes most sense to discuss CUTLASS features on CUTLASS GitHub pages.) As it could be seen from my PR, this feature is implemented the same way as |
Closing. Ref #880 |
We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here #47)
For this to run efficiently on the GPU we'd need kernel support for W4A8 quantization (int4 weights, int8 activations).
Other places where this has been raised before
NVIDIA/cutlass#1316,
NVIDIA/cutlass#1370
cc @andrewor14
The text was updated successfully, but these errors were encountered: