-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor QAT to use common fake_quantize_affine primitive #527
Conversation
Summary: Currently there are two QAT quantizers, 8da4w and 4w. Today, these use different autograd functions to represent their fake quantization numerics, but this is not scalable because new QAT quantizers may introduce yet another divergent code path. To address this, this commit refactors both quantizers to use the common fake_quantize_affine QAT primitive. Test Plan: python test/quantization/test_qat.py Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar, msaroufim
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/527
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8486207 with merge base 6dd82d8 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -25,7 +25,10 @@ | |||
ZeroPointDomain, | |||
) | |||
from torchao.quantization.unified import TwoStepQuantizer | |||
from torchao.quantization.utils import get_group_qparams_symmetric | |||
from torchao.quantization.utils import ( | |||
_get_per_token_block_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's helpful we could have a general util like:
def get_block_size(granularity, **kw_params) -> Callable:
if granularity == Granularity.PER_BLOCK:
...
elif type == Granularity.PER_TOKEN:
...
...
block_size = get_block_size(Granularity.PER_TOKEN)(x)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let's do that separately
Summary: Currently there are two QAT quantizers, 8da4w and 4w. Today, these use different autograd functions to represent their fake quantization numerics, but this is not scalable because new QAT quantizers may introduce yet another divergent code path. To address this, this commit refactors both quantizers to use the common fake_quantize_affine QAT primitive. Test Plan: python test/quantization/test_qat.py Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar, msaroufim
Summary: Currently there are two QAT quantizers, 8da4w and 4w. Today, these use different autograd functions to represent their fake quantization numerics, but this is not scalable because new QAT quantizers may introduce yet another divergent code path. To address this, this commit refactors both quantizers to use the common fake_quantize_affine QAT primitive. Test Plan: python test/quantization/test_qat.py Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar, msaroufim
Summary: Currently there are two QAT quantizers, 8da4w and 4w. Today, these use different autograd functions to represent their fake quantization numerics, but this is not scalable because new QAT quantizers may introduce yet another divergent code path. To address this, this commit refactors both quantizers to use the common fake_quantize_affine QAT primitive.
Test Plan:
python test/quantization/test_qat.py
Reviewers: jerryzh168
Subscribers: jerryzh168, supriyar, msaroufim