[WIP] Activation Aware Weight Quantization (AWQ) #743

vayuda · 2024-08-24T02:43:31Z

Adds AWQ per #530

To do:

Verify correctness of implementation and add tests for this
Fold activation scaling into previous layer if applicable
Model works with compile
Move implementation into the right files

pytorch-bot · 2024-08-24T02:43:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/743

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e7e329b with merge base 09b8b3c ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

torchao/prototype/awq/api.py

torchao/prototype/awq/core.py

test/prototype/test_awq.py

torchao/_models/llama/eval.py

torchao/dtypes/affine_quantized_tensor.py

torchao/_models/llama/eval.py

tutorials/calibration_flow/AWQ.py

torchao/prototype/awq/core.py

torchao/prototype/awq/api.py

torchao/prototype/awq/example.py

torchao/prototype/awq/readme.md

jerryzh168 · 2024-10-03T21:29:07Z

torchao/prototype/awq/api.py

+    return insert_subclass
+
+
+def awq_uintx(quant_dtype: torch.dtype = torch.uint4,


can you just remove weight_quant_fn and add use_hqq to the function?

also when it's uint4, I think it's fine to just use TensorCoreTiledLayout

I added that as a feature so maybe people can find ways to compose on top of AWQ esp if they come up with new kernels or what not, but I would agree that its not a necessary feature for an initial release

torchao/prototype/awq/example.py

jerryzh168

looks good, thanks for addressing all the comments! I think the main thing remaining is just to use use_hqq flag for awq_uintx so we don't take a random weight_quant_fn

also make sure to fix the CI issues as well

Integrate AWQ within the TorchAO framework

HDCharles · 2024-10-22T21:22:30Z

this shouldn't be in generate.py, it should be in eval so we can actually see the accuracy impact

vayuda added 11 commits August 15, 2024 16:52

init

d3db3db

Merge branch 'awq' of https://github.com/vayuda/ao into awq

1c29347

fixed implementation

ed864a2

Merge branch 'pytorch:main' into awq

90be216

reduced vmem req

0e690f9

Merge branch 'awq' of https://github.com/vayuda/ao into awq

05ae692

eval on LLMs

4519792

Merge branch 'pytorch:main' into awq

fca7895

eval on llm

0096a83

Merge branch 'awq'

20c2529

convert list to tensor

7614d51

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 24, 2024

vayuda added 2 commits August 23, 2024 22:57

restructuring

33a28dd

revert unecessary hf_eval changes

7d389b5

jerryzh168 reviewed Aug 24, 2024

View reviewed changes

torchao/prototype/awq/api.py Outdated Show resolved Hide resolved

jerryzh168 reviewed Aug 24, 2024

View reviewed changes

torchao/prototype/awq/core.py Outdated Show resolved Hide resolved

vayuda added 6 commits August 25, 2024 15:46

added wikitext eval test

5913b98

added tinygemm integration

7d045f9

made the calibration step much faster

db302ef

merge pt1

2ec38f1

Merge remote-tracking branch 'upstream/main' into awq

b400711

works/created tutorial

4aae94b