-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Quant][PT2E] Enable linear and linear-unary post-op quant recipe for x86 inductor quantizer #106781
Conversation
… x86 inductor quantizer [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106781
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 8abd7f8 with merge base 808e088 (): UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
… recipe for x86 inductor quantizer" [ghstack-poisoned]
… recipe for x86 inductor quantizer" [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… x86 inductor quantizer ghstack-source-id: 9a10b20bc82f3a6e8f0249aa2453b06327ed427c Pull Request resolved: pytorch#106781
… x86 inductor quantizer ghstack-source-id: 9a10b20bc82f3a6e8f0249aa2453b06327ed427c Pull Request resolved: pytorch#106781
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
@jerryzh168 Could you also kindly help to take a look of this PR? |
@@ -400,3 +417,80 @@ def test_conv2d_serials_binary_unary_with_quantizer_api(self): | |||
node_occurrence, | |||
node_list, | |||
) | |||
|
|||
@skipIfNoX86 | |||
def test_linear_with_quantizer_api(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can remove _with_quantizer_api
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's removed.
) | ||
|
||
@skipIfNoX86 | ||
def test_linear_unary_with_quantizer_api(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. It's removed.
node_list = [ | ||
torch.ops.quantized_decomposed.quantize_per_tensor.default, | ||
torch.ops.quantized_decomposed.dequantize_per_tensor.default, | ||
torch.ops.aten.addmm.default if use_bias else torch.ops.aten.mm.default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't need to land this before branch cut, I think you could switch to capture_pre_autograd_graph
for capture and you will see a single aten.linear op instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jerryzh168. Thanks for the suggestion. We hope to land this before code freeze. We feel difficult to switch to new API as code freeze is near. Is it OK to land this first then we switch to new API later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, that's fine
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
… recipe for x86 inductor quantizer" **Summary** Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op. **Test plan** python test/test_quantization.py -k test_linear_with_quantizer_api python test/test_quantization.py -k test_linear_unary_with_quantizer_api [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ant folding (#106782) **Summary** To realize weight prepack for quantized linear, we replace the following pattern ``` int8 activation | dequant_per_tensor | mm/addmm <- t <- dequant_per_channel <- int8_weight ``` with ``` int8 activation | onednn.qlinear_pointwise <- onednn.qlinear_prepack <- int8_weight ``` And we register weight prepack path inside inductor constant folding. Constant folding evaluates the prepack op and replace it with prepacked weight (a constant parameter) **Test plan** python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear_unary Pull Request resolved: #106782 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison ghstack dependencies: #105818, #106781
…ductor (#106934) **Summary** Enable lowering of quantized linear in Inductor **Test plan** python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear_unary Pull Request resolved: #106934 Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison ghstack dependencies: #105818, #106781, #106782
**Summary** Previously the UT of dequant promotion in Inductor only tests convolution. Now add linear case in the UT. This is for quantization PT2E with Inductor. **Test plan** python test/inductor/test_mkldnn_pattern_matcher.py -k test_dequant_promotion Pull Request resolved: #106935 Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/eellison ghstack dependencies: #105818, #106781, #106782, #106934
Stack from ghstack (oldest at bottom):
Summary
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add
quant-dequant
pattern for linear and linear-unary post op.Test plan
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api