-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plans for 8da4w quantization #883
Comments
Hi @jerryzh168 I saw your pointer here: |
this is used for executorch before, but it seems that we have people adding kernels here: #880. we are open to adding kernels for this |
Got it. Thanks for the clarification! |
@Xia-Weiwen that link is talking about the quantizer API since we are updating to the ao/torchao/quantization/quant_api.py Line 83 in 8236a87
|
Thanks for the pointers! |
Thanks for clarifying, @jerryzh168! |
…el (pytorch#883) * [Dist][Inference] Explore checkpoint loading
Hi,
From #430, it seems that 8da4w is primarily for Executorch, and is set to be deprecated. Please advise if there are any plans to enable it for CUDA & CPU as well, such that int4 weights could be converted to int8 just before computation?
Thanks!
cc @jerryzh168
The text was updated successfully, but these errors were encountered: