Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmoothQuant mappings tutorial #115

Merged
merged 9 commits into from
Oct 4, 2024
Merged

SmoothQuant mappings tutorial #115

merged 9 commits into from
Oct 4, 2024

Conversation

rahul-tuli
Copy link
Collaborator

Description:

This PR adds a tutorial that guides users on specifying the correct mappings for the SmoothQuant Modifier in llm-compressor. The tutorial explains how to smooth inputs to the q/k/v projections in self-attention and the fc1 block in feed-forward layers, based on the SmoothQuant paper.

Changes:

  • New Tutorial:
    • Located at src/llmcompressor/modifiers/smoothquant/README.md the tutorial provides instructions for identifying layers, targeting leaf modules, and using regular expressions for mappings.
    • A sample mapping for LLaMA-like models is included.

Motivation:

This tutorial simplifies applying SmoothQuant, helping users correctly target layers for efficient model quantization.

Testing:

  • Manually reviewed and tested on a sample LLaMA model to ensure accuracy of instructions.

Relevant Links:

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followed for the most part apart from one comment for clarity. Maybe throw this into a grammar editor to help with some of the wording. Otherwise, LGTM

src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved
src/llmcompressor/modifiers/smoothquant/README.md Outdated Show resolved Hide resolved
@rahul-tuli rahul-tuli force-pushed the smooth-quant-mappings-tutorial branch from fbc306a to 56ba00b Compare September 6, 2024 04:05
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a clarity question otherwise LGTM.

@robertgshaw2-neuralmagic would be good to get your feedback

@rahul-tuli
Copy link
Collaborator Author

just a clarity question otherwise LGTM.

@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

@dsikka
Copy link
Collaborator

dsikka commented Sep 6, 2024

just a clarity question otherwise LGTM.
@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

Yes I understand that. My comment was that I did not understand the wording behind layers smoothed inputs pass into - read awkwardly. Maybe something like layers that have smoothed inputs being passed into or something along those lines

@rahul-tuli
Copy link
Collaborator Author

just a clarity question otherwise LGTM.
@robertgshaw2-neuralmagic would be good to get your feedback

Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up.

Yes I understand that. My comment was that I did not understand the wording behind layers smoothed inputs pass into - read awkwardly. Maybe something like layers that have smoothed inputs being passed into or something along those lines

I tried to address it in my last commit kindly let me know if that's better

@kylesayrs
Copy link
Collaborator

Why do inputs and outputs need to be specified in the mappings? Correct me if I'm wrong, but smoothquant is non-applicable for layers with multiple inputs, so specifying the input module should be enough to uniquely identify that activation, right?

@rahul-tuli
Copy link
Collaborator Author

Why do inputs and outputs need to be specified in the mappings? Correct me if I'm wrong, but smoothquant is non-applicable for layers with multiple inputs, so specifying the input module should be enough to uniquely identify that activation, right?

You are correct we could do something like that, but that's an algorithmic change which is out of the scope for this PR

@kylesayrs
Copy link
Collaborator

@rahul-tuli We can make a future change to support output inference, conditioned on the type of the mapping. Ie List[List[Union[List[str], str]] is the current mapping type, List[str] can be an inferrable mapping type.

@mgoin mgoin merged commit 8b1f1b2 into main Oct 4, 2024
6 of 7 checks passed
@mgoin mgoin deleted the smooth-quant-mappings-tutorial branch October 4, 2024 00:45
markmc pushed a commit to markmc/llm-compressor that referenced this pull request Nov 13, 2024
* Adding MSE Clipping Support

* Updating docstring, weight single-update hack

* Only compute min-max values once for weight

* Adding tests, handling int inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants