Add more options in choose_qparams_affine for tinygemm op #227

jerryzh168 · 2024-05-07T23:11:19Z

Summary:
This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops

tinygemm choose_qparams op (and also quantize/dequantize op): https://github.com/pytorch/ao/blob/main/torchao/quantization/quant_primitives.py#L36 is different from the other choose_qparams op in that it does not enforce zero_point to be exactly representable, meaning the floating point value 0 can't be exactly represented by an integer value in quantized tensor. This PR adds a flag to produce a zero_point value that can be adapted to be used by tinygemm kernels.

Test Plan:
python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2024-05-07T23:11:22Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/227

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a0458db with merge base f6d56ca ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

HDCharles

we need to compare with the existing numerics of choosing qparams and q/dq. If we're introducing error for no benefit then i don't thing we should deprecate the other primitives. At the very least we need to discuss the functional changes first.

also need to test whether these primitives also work with gptq, there are nuances that i'm not sure are being captured.

jerryzh168 · 2024-05-08T00:59:42Z

we need to compare with the existing numerics of choosing qparams and q/dq. If we're introducing error for no benefit then i don't thing we should deprecate the other primitives. At the very least we need to discuss the functional changes first.

also need to test whether these primitives also work with gptq, there are nuances that i'm not sure are being captured.

sounds good, updated the op to work with tinygemm numerics.

for gptq and other types of e2e quant tests, we could establish a dashboard for perf and accuracy first, we can prioritize this in 0.3

updated the op to close the accuracy gap, please take a look again

cpuhrsch · 2024-05-08T18:25:43Z

test/quantization/test_quant_primitives.py

-        (scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize)
-        torch.testing.assert_allclose(scale_obs, scale_ao, rtol=0, atol=0)
+        (scale_ao, _) = get_group_qparams_symmetric(weight, n_bit, groupsize, precision=torch.float16)
+        torch.testing.assert_close(scale_obs, scale_ao, rtol=0, atol=0)


why this change?

you mean assert_close? I saw assert_allclose is deprecated, that's why I updated them

cpuhrsch · 2024-05-08T18:27:53Z

torchao/quantization/quant_primitives.py

        target_dtype (torch.dtype): dtype for target quantized Tensor
        quant_min (Optional[int]): minimum quantized value for target quantized Tensor
        quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor
        eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype
        scale_dtype (torch.dtype): dtype for scale Tensor
        zero_point_dtype (torch.dtype): dtype for zero_point Tensor
+        _is_zero_exactly_representable (bool): a private flag to indicate whether we need zero to be exactly


Interesting. I'd expect symmetric without zero_point to imply this. Is that true?

this is talking about whether floating point value 0 is exactly representable by a quantized value or not, it's not related to symmetric/asymmetric quant. this is assumed by most of the existing quantized kernels

cpuhrsch · 2024-05-08T20:37:54Z

torchao/quantization/quant_primitives.py

        target_dtype (torch.dtype): dtype for target quantized Tensor
        quant_min (Optional[int]): minimum quantized value for target quantized Tensor
        quant_max (Optioanl[int]): maximum quantized value for target quantized Tensor
        eps (Optional[float]): minimum scale, if not provided, default to eps of input.dtype
        scale_dtype (torch.dtype): dtype for scale Tensor
        zero_point_dtype (torch.dtype): dtype for zero_point Tensor
+        is_exact_zero (bool): a flag to indicate whether we need zero to be exactly


nit: Can you also define what "zero being exactly representable" means for the outputs? You can add zero_padding as an example of when this is useful. Also I'd remove the is_ part and choose preserve_zero or some other verb. Since this is a kwarg to a function and not a property of a class it seems more consistent.

torchao/quantization/quant_primitives.py

Summary: This is in preparation for replacing tinygemm q/dq ops with the unified quant primitive ops Test Plan: python test/quantization/test_quant_primitives.py -k test_tinygemm_get_groupwise_affine_qparams Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 7, 2024

jerryzh168 requested review from cpuhrsch, msaroufim and HDCharles May 7, 2024 23:11

jerryzh168 force-pushed the tinygemm branch from 251d09b to c393a6f Compare May 7, 2024 23:23

HDCharles previously requested changes May 8, 2024

View reviewed changes

jerryzh168 force-pushed the tinygemm branch from afcc4da to 88621aa Compare May 8, 2024 00:57

jerryzh168 changed the title ~~Add test for choose_qparams for tinygemm ops~~ Add more options in choose_qparams_affine for tinygemm op May 8, 2024

jerryzh168 requested a review from HDCharles May 8, 2024 00:58

jerryzh168 force-pushed the tinygemm branch 5 times, most recently from 8d76b57 to 40d5a75 Compare May 8, 2024 16:42

jerryzh168 force-pushed the tinygemm branch from 40d5a75 to ab5fb74 Compare May 8, 2024 17:26

cpuhrsch reviewed May 8, 2024

View reviewed changes

jerryzh168 force-pushed the tinygemm branch from ab5fb74 to eaac5eb Compare May 8, 2024 20:28

cpuhrsch reviewed May 8, 2024

View reviewed changes

jerryzh168 force-pushed the tinygemm branch from eaac5eb to 7d84cbb Compare May 8, 2024 20:54

jerryzh168 requested a review from cpuhrsch May 8, 2024 20:57

jerryzh168 force-pushed the tinygemm branch from 7d84cbb to 7e53a5e Compare May 8, 2024 21:13

jerryzh168 mentioned this pull request May 8, 2024

Enable dispatch to tinygemm int4 and int8 kernels for quantized tensor #230

Merged

cpuhrsch approved these changes May 9, 2024

View reviewed changes

cpuhrsch reviewed May 9, 2024

View reviewed changes

torchao/quantization/quant_primitives.py Outdated Show resolved Hide resolved

jerryzh168 force-pushed the tinygemm branch from 65931e3 to fbb3062 Compare May 9, 2024 02:12

jerryzh168 force-pushed the tinygemm branch from fbb3062 to d5efc49 Compare May 9, 2024 02:14

Merge branch 'main' into tinygemm

a0458db

jerryzh168 merged commit b91b6be into pytorch:main May 9, 2024
13 checks passed

jerryzh168 deleted the tinygemm branch May 9, 2024 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more options in choose_qparams_affine for tinygemm op #227

Add more options in choose_qparams_affine for tinygemm op #227

jerryzh168 commented May 7, 2024 •

edited

Loading

pytorch-bot bot commented May 7, 2024 •

edited

Loading

HDCharles left a comment •

edited

Loading

jerryzh168 commented May 8, 2024

cpuhrsch May 8, 2024

jerryzh168 May 8, 2024

cpuhrsch May 8, 2024

jerryzh168 May 8, 2024

cpuhrsch May 8, 2024 •

edited

Loading

Add more options in choose_qparams_affine for tinygemm op #227

Add more options in choose_qparams_affine for tinygemm op #227

Conversation

jerryzh168 commented May 7, 2024 • edited Loading

pytorch-bot bot commented May 7, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/227

✅ No Failures

HDCharles left a comment • edited Loading

Choose a reason for hiding this comment

jerryzh168 commented May 8, 2024

cpuhrsch May 8, 2024

Choose a reason for hiding this comment

jerryzh168 May 8, 2024

Choose a reason for hiding this comment

cpuhrsch May 8, 2024

Choose a reason for hiding this comment

jerryzh168 May 8, 2024

Choose a reason for hiding this comment

cpuhrsch May 8, 2024 • edited Loading

Choose a reason for hiding this comment

jerryzh168 commented May 7, 2024 •

edited

Loading

pytorch-bot bot commented May 7, 2024 •

edited

Loading

HDCharles left a comment •

edited

Loading

cpuhrsch May 8, 2024 •

edited

Loading