-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback on quantize()
API
#384
Comments
So my understanding of
Regarding the
On your point around compilation, it is indeed unclear when a user should vs must compile and we need to communicate the benefits and the necessity of compilation might drive users back to a module swap api |
Using the new
|
@msaroufim I am working on the support for unwrapping/wrapping nested tensor subclasses in PT2. In general, we expect we should be able to preserve the tensor subclasses if users are targetting our training IR and they shouldn't have to rely on unwrap_tensor_subclass(). |
Hi, I noticed that the GPTQ-related API was marked to be moved to prototype. Is there any alternative API to use, or are there any plans to support GPTQ formally? |
@gau-nernst thanks for the feedback.
|
we are thinking of deprecating GPTQ when we make HQQ work. cc @HDCharles to confirm that hqq is better than GPTQ in general. |
can you also describe your use case for GPTQ as well? |
@yiliu30 to add on to what @jerryzh168 is saying, we haven't seen a lot of people interested in this API at the moment so its not something we've invested a ton of effort into, there are some limitations in the existing API/implementation that make it not work on some parts of some models unless they're carefully handled (https://github.com/pytorch/ao/blob/main/torchao/_models/llama/model.py#L89-L96) . We could fix those if we rewrote the whole thing, but until we do that, it hasn't been tested as thoroughly and isn't expected to work as widely as something like int8 weight only quantization. If you have a significant use case for GPTQ that may change what we do with it. |
@jerryzh168 @HDCharles My reason for keeping GPTQ support is that it is quite popular within the community :). For instance, Hugging Face currently includes 3000+ GPTQ models. |
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
@jerryzh168 Just visiting this issue again, particularly about |
main thing is it makes it a bit harder to debug I think, we'll be removing this soon though, in these two days, stay tuned. we are waiting for pytorch/pytorch#127431 to be landed, and I'll put up a PR to remove it |
@jerryzh168 that's good to hear! However, users of previous versions of PyTorch (e.g. v2.3) will still need to unwarp tensor subclass? Might not be that important. |
yeah that's true, I hope at some point we can just stop supporting 2.2 and 2.3 so we can deprecate the old APIs as well. also we have an updated timeline of weeks to month, see comments in #462 (comment) for more details |
…orch#400) Summary: Addressing feedback from pytorch#384 and pytorch#375 Test Plan: regression tests python test/quantization/test_quant_api.py python test/integration/test_integration.py Reviewers: Subscribers: Tasks: Tags:
Previously we do this
With the new quantization API, we have to do this
I think the new API is less user-friendly than the previous one.
int8wo()
,int4wo()
is a bit unintuitive. I understand it is a mechanism to pass params like group size to the quantization. Alternatives: full-blown class with__call__()
method e.g.Int8WeightOnlyConfig
(kinda verbose, but intention is clear); just pass quant params as extra args/kwargs e.g.quantize("int4wo", groupsize=128)
unwrap_tensor_subclass()
does. Also, why do we need it now to compile the model, but not previously?unwrap_tensor_subclass()
should be imported fromtorchao.utils
ortorchao.quantization.quant_api
, nottorchao.quantization.utils
(https://github.com/pytorch/ao/tree/main/torchao/quantization)@jerryzh168
The text was updated successfully, but these errors were encountered: