SmoothQuant mappings tutorial #115

rahul-tuli · 2024-08-26T17:45:09Z

Description:

This PR adds a tutorial that guides users on specifying the correct mappings for the SmoothQuant Modifier in llm-compressor. The tutorial explains how to smooth inputs to the q/k/v projections in self-attention and the fc1 block in feed-forward layers, based on the SmoothQuant paper.

Changes:

New Tutorial:
- Located at src/llmcompressor/modifiers/smoothquant/README.md the tutorial provides instructions for identifying layers, targeting leaf modules, and using regular expressions for mappings.
- A sample mapping for LLaMA-like models is included.

Motivation:

This tutorial simplifies applying SmoothQuant, helping users correctly target layers for efficient model quantization.

Testing:

Manually reviewed and tested on a sample LLaMA model to ensure accuracy of instructions.

Relevant Links:

SmoothQuant Paper: https://arxiv.org/abs/2211.10438
Code Reference: smoothquant/base.py

dsikka

Followed for the most part apart from one comment for clarity. Maybe throw this into a grammar editor to help with some of the wording. Otherwise, LGTM

src/llmcompressor/modifiers/smoothquant/README.md

dsikka

just a clarity question otherwise LGTM.

@robertgshaw2-neuralmagic would be good to get your feedback

src/llmcompressor/modifiers/smoothquant/README.md

rahul-tuli · 2024-09-06T20:08:10Z

just a clarity question otherwise LGTM.

@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

dsikka · 2024-09-06T20:37:39Z

just a clarity question otherwise LGTM.
@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

Yes I understand that. My comment was that I did not understand the wording behind layers smoothed inputs pass into - read awkwardly. Maybe something like layers that have smoothed inputs being passed into or something along those lines

rahul-tuli · 2024-09-07T00:25:18Z

just a clarity question otherwise LGTM.
@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

Yes I understand that. My comment was that I did not understand the wording behind layers smoothed inputs pass into - read awkwardly. Maybe something like layers that have smoothed inputs being passed into or something along those lines

I tried to address it in my last commit kindly let me know if that's better

src/llmcompressor/modifiers/smoothquant/README.md

kylesayrs · 2024-09-12T02:37:27Z

Why do inputs and outputs need to be specified in the mappings? Correct me if I'm wrong, but smoothquant is non-applicable for layers with multiple inputs, so specifying the input module should be enough to uniquely identify that activation, right?

rahul-tuli · 2024-10-03T13:44:42Z

Why do inputs and outputs need to be specified in the mappings? Correct me if I'm wrong, but smoothquant is non-applicable for layers with multiple inputs, so specifying the input module should be enough to uniquely identify that activation, right?

You are correct we could do something like that, but that's an algorithmic change which is out of the scope for this PR

kylesayrs · 2024-10-03T16:54:39Z

@rahul-tuli We can make a future change to support output inference, conditioned on the type of the mapping. Ie List[List[Union[List[str], str]] is the current mapping type, List[str] can be an inferrable mapping type.

* Adding MSE Clipping Support * Updating docstring, weight single-update hack * Only compute min-max values once for weight * Adding tests, handling int inputs

rahul-tuli requested review from Satrat, bfineran, kylesayrs, horheynm, robertgshaw2-neuralmagic and dsikka August 26, 2024 17:45

dsikka reviewed Sep 6, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/README.md Show resolved Hide resolved

src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved

src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved

rahul-tuli added 3 commits September 6, 2024 04:04

SmoothQuant mappings tutorial

ad9a397

address review comments

b86cbbb

Fix grammar

56ba00b

rahul-tuli force-pushed the smooth-quant-mappings-tutorial branch from fbc306a to 56ba00b Compare September 6, 2024 04:05

Merge branch 'main' into smooth-quant-mappings-tutorial

8435905

dsikka reviewed Sep 6, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/README.md Show resolved Hide resolved

Address wording

d620ae9

kylesayrs reviewed Sep 12, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved

kylesayrs reviewed Sep 12, 2024

View reviewed changes

src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved

dsikka and others added 3 commits October 2, 2024 10:32

Merge branch 'main' into smooth-quant-mappings-tutorial

a2c7a9e

Address review comments from @kylesayrs

5505474

Merge branch 'main' into smooth-quant-mappings-tutorial

8a5c989

nit phrasing

6d2b6c1

kylesayrs approved these changes Oct 3, 2024

View reviewed changes

mgoin approved these changes Oct 4, 2024

View reviewed changes

mgoin merged commit 8b1f1b2 into main Oct 4, 2024
6 of 7 checks passed

mgoin deleted the smooth-quant-mappings-tutorial branch October 4, 2024 00:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SmoothQuant mappings tutorial #115

SmoothQuant mappings tutorial #115

rahul-tuli commented Aug 26, 2024

dsikka left a comment

dsikka left a comment

rahul-tuli commented Sep 6, 2024

dsikka commented Sep 6, 2024

rahul-tuli commented Sep 7, 2024

kylesayrs commented Sep 12, 2024

rahul-tuli commented Oct 3, 2024

kylesayrs commented Oct 3, 2024

SmoothQuant mappings tutorial #115

SmoothQuant mappings tutorial #115

Conversation

rahul-tuli commented Aug 26, 2024

Description:

Changes:

Motivation:

Testing:

Relevant Links:

dsikka left a comment

Choose a reason for hiding this comment

dsikka left a comment

Choose a reason for hiding this comment

rahul-tuli commented Sep 6, 2024

dsikka commented Sep 6, 2024

rahul-tuli commented Sep 7, 2024

kylesayrs commented Sep 12, 2024

rahul-tuli commented Oct 3, 2024

kylesayrs commented Oct 3, 2024