[bc-breaking] enable direct configuration in quantize_ #1595

vkuzo · 2025-01-22T16:49:12Z

summary

This PR enables passing per-workflow arguments to quantize_ directly, without wrapping them in a Callable.

Motivation: passing direct configuraton is intuintive and widely used in similar contexts across various projects. Passing configuration wrapped in a callable is IMO not intuitive, hard to understand and debug, and we have evidence that it pushes a portion of users from building on top of torchao.

We will keep the old callable syntax supported by quantize_ for one release cycle, and delete it afterwards. We will keep the old names as aliases for new names going forward (example: int4_weight_only as an alias of Int4WeightOnlyConfig) to keep existing callsites working without changes.

user facing API changes

signature of quantize_

#
# before
#
def quantize(
    model: torch.nn.Module,
    apply_tensor_subclass: Callable[[torch.nn.Module], torch.nn.Module],
    ...,
): ...

#
# after - intermediate state, support both old and new for one release
#
def quantize(
    model: torch.nn.Module,
    config: Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]],
    ...,
): ...

#
# after - long term state
#
def quantize(
    model: torch.nn.Module,
    config: AOBaseConfig,
    ...,
): ...

usage example

An example for int4_weight_only

#
# before
#
quantize_(m, int4_weight_only(group_size=32))

#
# after, with new user facing names
#
quantize_(m, Int4WeightOnlyConfig(group_size=32))

#
# AND, after, with BC names
#
quantize_(m, int4_weight_only(group_size=32))

developer facing changes

See the PR details for examples, but they can be summarized as:

#
# old
#

# quantize_ calls the instance of calling this function on each module of the model
def int4_weight_only(group_size: int, ...) -> Callable:

    def new_callable(weight: torch.Tensor):
        # configuration is captured here via local variables
        ...
        
    # return type is a Callable
    return _get_linear_subclass_inserter(new_callable)

#
# new
#

# config base class
class AOBaseConfig(abc.ABC):
    pass

# user facing configuration of a workflow
@dataclass
class Int4WeightOnlyConfig(AOBaseConfig):
    group_size: int = 128
    ...

# not user facing transform of a module according to a worfklow's configuration
@register_quantize_module_handler(Int4WeightOnlyConfig)
def _int4_weight_only_transform(
    module: torch.nn.Module, 
    config: Int4WeightOnlyConfig,
) -> torch.nn.Module:
    # map to AQT, not user facing
    ...

current status

The current PR migrates three user facing workflows:

PTQ's int4_weight_only
QAT's intx_quantization_aware_training and from_intx_quantization_aware_training

I've chosen to migrate one PTQ and two QAT workflows to prove generality of the new flow, but avoid a high LOC in this PR to make it easier to review. We will migrate the rest of the workflows in future PRs, detailed below:

int8_dynamic_activation_int4_weight
int8_dynamic_activation_int8_weight
int8_dynamic_activation_int8_semi_sparse_weight
int8_weight_only
float8_weight_only
float8_dynamic_activation_float8_weight
float8_static_activation_float8_weight
uintx_weight_only
fpx_weight_only
gemlite_uintx_weight_only
callsites from the prototype folder

After a release cycle, we will delete the old callable syntax.

Test Plan:

pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-01-22T16:49:13Z

Stack from ghstack (oldest at bottom):

-> [bc-breaking] enable direct configuration in quantize_ #1595

pytorch-bot · 2025-01-22T16:49:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCM Infra failures during checkout of PyTorch

✅ No Failures

As of commit 26850da with merge base 8afd10e ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb0703f88413bc06962dacde24ff6bb7cf0f3b19 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 73e9a5c3bf03e2cb645cc0ea43bec162a5f4897e ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ff2d58b120453a36d10c24da3df207b9348bdc7a ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05b6a547051288c8e59bad7d1df3bca402ea3991 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e4f1550e3130d523e244a2dfdebb7d4db824c388 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0716eda5694ddd9a649fc2cdbb292121a1f4da4 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

torchao/core/config.py

torchao/quantization/_transform_module.py

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 56720183d4530d718a44257ec61110f7a3ffee9f ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2cb59edde02826639292373da3653a045b06ce7f ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fc9a5c15c6269b83fe4e5b9025728b1e53627490 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

andrewor14

Looks great! Mostly just minor doc nits.

andrewor14 · 2025-02-05T21:30:53Z

test/dtypes/test_affine_quantized.py

@@ -180,8 +187,13 @@ def apply_uint6_weight_only_quant(linear):
    )
    @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
    def test_print_quantized_module(self, apply_quant):
+        print(apply_quant)


andrewor14 · 2025-02-05T21:31:21Z

test/dtypes/test_affine_quantized.py

+            quantize_(linear, apply_quant)
+            ql = linear
+        else:
+            ql = apply_quant(linear)


once we migrate all functions to configs we won't need this check anymore right? Should we add a TODO to remove it?

andrewor14 · 2025-02-05T21:33:16Z

torchao/core/config.py

@@ -0,0 +1,10 @@
+import abc


I feel we can just add this to torchao/config.py without making a new core directory. No strong preference though

slightly stronger preference is I feel "core" shouldn't appear in the import, so users should be able to do this:

from torchao.config import AOBaseConfig

but we can do that by adding this to __init__.py

andrewor14 · 2025-02-05T21:35:23Z

test/quantization/test_qat.py

@@ -1185,7 +1185,7 @@ def test_qat_prototype_bc(self):
    @unittest.skipIf(
        not TORCH_VERSION_AT_LEAST_2_4, "skipping when torch version is 2.4 or lower"
    )
-    def test_quantize_api(self):
+    def test_quantize_api_standalone(self):


do we need this change?

andrewor14 · 2025-02-05T21:36:27Z

torchao/quantization/qat/api.py

@@ -315,22 +328,31 @@ def from_intx_quantization_aware_training() -> Callable:
        )


need to update the docstring here in the previous line

andrewor14 · 2025-02-05T21:36:47Z

torchao/quantization/qat/api.py

@@ -269,37 +287,32 @@ def intx_quantization_aware_training(
    `torch.nn.Embedding` with an activation config, then we will raise


I can't comment up there but need to update the docstring in L282

andrewor14 · 2025-02-05T21:39:12Z

torchao/core/config.py

+    """
+    If a workflow config inherits from this then `quantize_` knows
+    how to a apply it to a model.
+    """


should we add a paragraph here or under quantize_ about how this is related to register_quantize_module_handler, so users who wish to add their own configs know how to do it?

andrewor14 · 2025-02-05T21:41:22Z

torchao/quantization/quant_api.py

+            handler,
+            _is_linear if filter_fn is None else filter_fn,
+            device=device,
+            extra_args=(config,),


alternatively we can pass in a lambda, then we don't need to add extra_args or pass in config:

replace_fn = lambda mod: handler(mod, config)

seems simpler

I'm really not a fan of passing callables around, it's easy when the callable is simple but easy for future people to tack ugly stuff on and increase complexity. Non-callable args make it harder to make the code ugly in the future.

oh sorry, I meant pass in replace_fn instead of handler, like:

replace_fn = lambda mod: handler(mod, config) _replace_with_custom_fn_if_matches_filter( model, replace_fn, _is_linear if filter_fn is None else filter_fn, device=device, )

either way you're passing a callable

andrewor14 · 2025-02-05T21:42:15Z

torchao/quantization/transform_module.py

+] = {}
+
+
+def register_quantize_module_handler(config_type):


nit: add some docstrings here to explain how this is related to quantize_ and AOBaseConfig?

Update

24114ce

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025

vkuzo changed the title ~~[wip] configs configs configs!~~ [rfc] enable direct configuration in quantize_, v2 Jan 22, 2025

vkuzo added the topic: bc-breaking Use this tag if this PR breaks backward compatibility label Jan 22, 2025

vkuzo mentioned this pull request Jan 22, 2025

[rfc] enable direct configuration in quantize_ #1585

Closed

Update

5b9d876

[ghstack-poisoned]

Update

1cea42f

[ghstack-poisoned]

Update

138883b

[ghstack-poisoned]

Update

ba045ea

[ghstack-poisoned]

Update

94d9426

[ghstack-poisoned]

vkuzo requested review from andrewor14, jerryzh168, drisspg and HDCharles January 23, 2025 16:15

vkuzo changed the title ~~[rfc] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_, v2 Jan 23, 2025

vkuzo changed the title ~~[bc-breaking] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_ Jan 23, 2025

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/core/config.py Outdated Show resolved Hide resolved

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/quantization/_transform_module.py Outdated Show resolved Hide resolved

drisspg reviewed Jan 23, 2025

View reviewed changes

torchao/quantization/_transform_module.py Outdated Show resolved Hide resolved

Update

b589ce7

[ghstack-poisoned]

drisspg approved these changes Jan 23, 2025

View reviewed changes

vkuzo mentioned this pull request Jan 29, 2025

make smoothquant more PT2 friendly #1639

Open

Update

aaba2d8

[ghstack-poisoned]

Update

26850da

[ghstack-poisoned]

andrewor14 approved these changes Feb 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bc-breaking] enable direct configuration in quantize_ #1595

[bc-breaking] enable direct configuration in quantize_ #1595

vkuzo commented Jan 22, 2025 •

edited

Loading

vkuzo commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading

andrewor14 left a comment

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

vkuzo Feb 5, 2025

andrewor14 Feb 5, 2025

andrewor14 Feb 5, 2025

		@@ -315,22 +328,31 @@ def from_intx_quantization_aware_training() -> Callable:
		)

		@@ -269,37 +287,32 @@ def intx_quantization_aware_training(
		`torch.nn.Embedding` with an activation config, then we will raise

[bc-breaking] enable direct configuration in quantize_ #1595

Are you sure you want to change the base?

[bc-breaking] enable direct configuration in quantize_ #1595

Conversation

vkuzo commented Jan 22, 2025 • edited Loading

summary

user facing API changes

signature of quantize_

usage example

developer facing changes

current status

vkuzo commented Jan 22, 2025 • edited Loading

pytorch-bot bot commented Jan 22, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

❗ 1 Active SEVs

✅ No Failures

andrewor14 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkuzo commented Jan 22, 2025 •

edited

Loading

vkuzo commented Jan 22, 2025 •

edited

Loading

pytorch-bot bot commented Jan 22, 2025 •

edited

Loading