Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Quant][PT2E] Enable linear and linear-unary post-op quant recipe for x86 inductor quantizer #106781

Closed
wants to merge 14 commits into from

Conversation

Xia-Weiwen
Copy link
Collaborator

@Xia-Weiwen Xia-Weiwen commented Aug 8, 2023

Stack from ghstack (oldest at bottom):

Summary
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add quant-dequant pattern for linear and linear-unary post op.

Test plan
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api

@pytorch-bot
Copy link

pytorch-bot bot commented Aug 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106781

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8abd7f8 with merge base 808e088 (image):

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

… recipe for x86 inductor quantizer"

[ghstack-poisoned]
… recipe for x86 inductor quantizer"

[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
leslie-fang-intel pushed a commit to leslie-fang-intel/pytorch that referenced this pull request Aug 14, 2023
… x86 inductor quantizer

ghstack-source-id: 9a10b20bc82f3a6e8f0249aa2453b06327ed427c
Pull Request resolved: pytorch#106781
leslie-fang-intel pushed a commit to leslie-fang-intel/pytorch that referenced this pull request Aug 16, 2023
… x86 inductor quantizer

ghstack-source-id: 9a10b20bc82f3a6e8f0249aa2453b06327ed427c
Pull Request resolved: pytorch#106781
@leslie-fang-intel leslie-fang-intel marked this pull request as ready for review August 18, 2023 08:46
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
@Xia-Weiwen Xia-Weiwen requested a review from jgong5 August 22, 2023 05:33
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
@leslie-fang-intel
Copy link
Collaborator

@jerryzh168 Could you also kindly help to take a look of this PR?

@@ -400,3 +417,80 @@ def test_conv2d_serials_binary_unary_with_quantizer_api(self):
node_occurrence,
node_list,
)

@skipIfNoX86
def test_linear_with_quantizer_api(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can remove _with_quantizer_api now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's removed.

)

@skipIfNoX86
def test_linear_unary_with_quantizer_api(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's removed.

node_list = [
torch.ops.quantized_decomposed.quantize_per_tensor.default,
torch.ops.quantized_decomposed.dequantize_per_tensor.default,
torch.ops.aten.addmm.default if use_bias else torch.ops.aten.mm.default,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't need to land this before branch cut, I think you could switch to capture_pre_autograd_graph for capture and you will see a single aten.linear op instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jerryzh168. Thanks for the suggestion. We hope to land this before code freeze. We feel difficult to switch to new API as code freeze is near. Is it OK to land this first then we switch to new API later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that's fine

… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
@leslie-fang-intel leslie-fang-intel added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 26, 2023
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
… recipe for x86 inductor quantizer"


**Summary**
Add linear and linear-unary post-op quantization recipe to x86 inductor quantizer. For PT2E with Inductor. With this, the quantization path will add `quant-dequant` pattern for linear and linear-unary post op.

**Test plan**
python test/test_quantization.py -k test_linear_with_quantizer_api
python test/test_quantization.py -k test_linear_unary_with_quantizer_api


[ghstack-poisoned]
@Xia-Weiwen
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2023
…ant folding (#106782)

**Summary**
To realize weight prepack for quantized linear, we replace the following pattern
```
int8 activation
      |
dequant_per_tensor
      |
mm/addmm <- t <- dequant_per_channel <- int8_weight
```
with
```
int8 activation
  |
onednn.qlinear_pointwise <- onednn.qlinear_prepack <- int8_weight
```
And we register weight prepack path inside inductor constant folding. Constant folding evaluates the prepack op and replace it with prepacked weight (a constant parameter)

**Test plan**
python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear_unary

Pull Request resolved: #106782
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison
ghstack dependencies: #105818, #106781
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2023
…ductor (#106934)

**Summary**
Enable lowering of quantized linear in Inductor

**Test plan**
python test/inductor/test_mkldnn_pattern_matcher.py -k test_qlinear_unary

Pull Request resolved: #106934
Approved by: https://github.com/jgong5, https://github.com/leslie-fang-intel, https://github.com/eellison
ghstack dependencies: #105818, #106781, #106782
pytorchmergebot pushed a commit that referenced this pull request Aug 27, 2023
**Summary**
Previously the UT of dequant promotion in Inductor only tests convolution. Now add linear case in the UT. This is for quantization PT2E with Inductor.

**Test plan**
python test/inductor/test_mkldnn_pattern_matcher.py -k test_dequant_promotion

Pull Request resolved: #106935
Approved by: https://github.com/leslie-fang-intel, https://github.com/jgong5, https://github.com/eellison
ghstack dependencies: #105818, #106781, #106782, #106934
@facebook-github-bot facebook-github-bot deleted the gh/Xia-Weiwen/18/head branch August 30, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: quantization release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants