Match torch.fake_quantize numerics in 8da4w QAT #229

andrewor14 · 2024-05-08T18:34:03Z

Summary: There are two subtle differences between the 8da4w quant primitives and torch.fake_quantize_per_channel_affine today:

8da4w uses float32 zero points
torch.fake_quantize uses int32 zero points
8da4w uses input.div(scales)
torch.fake_quantize uses input.mul(1.0 / scales)

Of these two differences, the second one is smaller and only resulted in 0.1% elements mismatched in unit tests, but it is a source of numerical divergence nonetheless.

This commit changes 8da4w QAT quant primitives to match the torch.fake_quantize behavior for both of these differences. In a future commit, we will change the 8da4w PTQ quant primitives as well so PTQ and QAT remain consistent.

Note: This commit also has the side effect of reducing memory footprint significantly for bf16 inputs. We now cast them to fp32 before multiplying them with fp32 scales. This reduced memory usage presumably because bf16 * fp32 kernels are not as memory efficient.

Test Plan:
python test/quantization/test_qat.py -k test_qat_generic_fake_quantize

Reviewers: jerryzh168, cpuhrsch

Subscribers: jerryzh168, cpuhrsch, supriyar

pytorch-bot · 2024-05-08T18:34:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/229

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 54c48d1 with merge base 3dd16c9 ():

NEW FAILURE - The following job has failed:

.github/workflows/build.yml (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14 · 2024-05-08T18:52:18Z

By the way a few existing tests will fail because we haven't made the corresponding input.mul(1.0 / scale) change in PyTorch yet. I'm writing a PR there now

Summary: Follow-up to pytorch/ao#229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar [ghstack-poisoned]

torchao/quantization/prototype/qat.py

jerryzh168 · 2024-05-08T21:43:58Z

these seems to be small differences and I don't expect will cause large error actually

Summary: Follow-up to pytorch/ao#229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar ghstack-source-id: 77184575ff8349028ea46a0cec88825053c72fef Pull Request resolved: #125781

Summary: Follow-up to pytorch/ao#229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar ghstack-source-id: 63d92d7c7db63f3cb5ca16cc338806bd7b3cc672 Pull Request resolved: #125781

Summary: Follow-up to pytorch/ao#229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar ghstack-source-id: 46c1d3dec51ea9fe873e3995c774ae77b2aa52b0 Pull Request resolved: #125781

…125781) Summary: Follow-up to pytorch/ao#229. This resolves the difference between `input.div(scales)` and `input.mul(1.0 / scales)`, which results in small numerical discrepancies on some inputs. Test Plan: python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_channel_group python test/test_quantization.py TestQuantizedTensor.test_decomposed_quantize_per_token Reviewers: jerryzh168 Subscribers: jerryzh168, supriyar Pull Request resolved: #125781 Approved by: https://github.com/jerryzh168

jerryzh168

can we guard the QAT feature itself under TORCH_VERSION_AFTER_2_4?

cpuhrsch · 2024-05-15T19:33:33Z

When adding this guard, please provide a clear error messages by providing an empty stub for users of older versions.

andrewor14 · 2024-05-15T21:47:46Z

can we guard the QAT feature itself under TORCH_VERSION_AFTER_2_4?
When adding this guard, please provide a clear error messages by providing an empty stub for users of older versions.

Yup both done

Summary: There are two subtle differences between the 8da4w quant primitives and `torch.fake_quantize_per_channel_affine` today: 1. 8da4w uses float32 zero points torch.fake_quantize uses int32 zero points 2. 8da4w uses input.div(scales) torch.fake_quantize uses input.mul(1.0 / scales) Of these two differences, the second one is smaller and only resulted in 0.1% elements mismatched in unit tests, but it is a source of numerical divergence nonetheless. This commit changes 8da4w QAT quant primitives to match the torch.fake_quantize behavior for both of these differences. In a future commit, we will change the 8da4w PTQ quant primitives as well so PTQ and QAT remain consistent. Note: This commit also has the side effect of reducing memory footprint significantly for bf16 inputs. We now cast them to fp32 before multiplying them with fp32 scales. This reduced memory usage presumably because bf16 * fp32 kernels are not as memory efficient. Test Plan: python test/quantization/test_qat.py -k test_qat_generic_fake_quantize Reviewers: jerryzh168, cpuhrsch Subscribers: jerryzh168, cpuhrsch, supriyar

andrewor14 requested a review from jerryzh168 May 8, 2024 18:34

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 8, 2024

andrewor14 mentioned this pull request May 8, 2024

[quant] Make per_group and per_token quant match torch.fake_quantize pytorch/pytorch#125781

Closed

jerryzh168 reviewed May 8, 2024

View reviewed changes

torchao/quantization/prototype/qat.py Outdated Show resolved Hide resolved

andrewor14 force-pushed the fq-test branch from e774327 to 8484f46 Compare May 11, 2024 00:17

andrewor14 force-pushed the fq-test branch 2 times, most recently from 5fa3fb3 to 56f0f78 Compare May 15, 2024 18:17

jerryzh168 approved these changes May 15, 2024

View reviewed changes

andrewor14 mentioned this pull request May 15, 2024

Dedup _choose_qparams_per_token_asymmetric #233

Closed

andrewor14 force-pushed the fq-test branch from 56f0f78 to 2c057c5 Compare May 15, 2024 21:46

andrewor14 force-pushed the fq-test branch 2 times, most recently from faa4781 to 5517770 Compare May 15, 2024 21:57

andrewor14 force-pushed the fq-test branch from 5517770 to 54c48d1 Compare May 15, 2024 22:03

andrewor14 merged commit cae3d82 into main May 15, 2024
13 checks passed

andrewor14 deleted the fq-test branch May 16, 2024 15:29

yanbing-j pushed a commit to yanbing-j/ao that referenced this pull request Dec 9, 2024

Add quant workflow to validate.sh (pytorch#229)

e129505

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match torch.fake_quantize numerics in 8da4w QAT #229

Match torch.fake_quantize numerics in 8da4w QAT #229

andrewor14 commented May 8, 2024 •

edited

Loading

pytorch-bot bot commented May 8, 2024 •

edited

Loading

andrewor14 commented May 8, 2024

jerryzh168 commented May 8, 2024

jerryzh168 left a comment

cpuhrsch commented May 15, 2024

andrewor14 commented May 15, 2024

Match torch.fake_quantize numerics in 8da4w QAT #229

Match torch.fake_quantize numerics in 8da4w QAT #229

Conversation

andrewor14 commented May 8, 2024 • edited Loading

pytorch-bot bot commented May 8, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/229

❌ 1 New Failure

andrewor14 commented May 8, 2024

jerryzh168 commented May 8, 2024

jerryzh168 left a comment

Choose a reason for hiding this comment

cpuhrsch commented May 15, 2024

andrewor14 commented May 15, 2024

andrewor14 commented May 8, 2024 •

edited

Loading

pytorch-bot bot commented May 8, 2024 •

edited

Loading