-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SmoothQuant mappings tutorial #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed for the most part apart from one comment for clarity. Maybe throw this into a grammar editor to help with some of the wording. Otherwise, LGTM
fbc306a
to
56ba00b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a clarity question otherwise LGTM.
@robertgshaw2-neuralmagic would be good to get your feedback
Dipika smoothquant shifts the difficulty of quantizing activations to weights by scaling up the the weights and performing the inverse transformation to activations; to accomplish this it needs to know which layer the smoothed out activations pass into i.e which layers weights need to be scaled up. |
Yes I understand that. My comment was that I did not understand the wording behind |
I tried to address it in my last commit kindly let me know if that's better |
Why do inputs and outputs need to be specified in the mappings? Correct me if I'm wrong, but smoothquant is non-applicable for layers with multiple inputs, so specifying the input module should be enough to uniquely identify that activation, right? |
You are correct we could do something like that, but that's an algorithmic change which is out of the scope for this PR |
@rahul-tuli We can make a future change to support output inference, conditioned on the type of the mapping. Ie |
* Adding MSE Clipping Support * Updating docstring, weight single-update hack * Only compute min-max values once for weight * Adding tests, handling int inputs
Description:
This PR adds a tutorial that guides users on specifying the correct mappings for the SmoothQuant Modifier in
llm-compressor
. The tutorial explains how to smooth inputs to theq/k/v
projections in self-attention and thefc1
block in feed-forward layers, based on the SmoothQuant paper.Changes:
src/llmcompressor/modifiers/smoothquant/README.md
the tutorial provides instructions for identifying layers, targeting leaf modules, and using regular expressions for mappings.Motivation:
This tutorial simplifies applying SmoothQuant, helping users correctly target layers for efficient model quantization.
Testing:
Relevant Links:
smoothquant/base.py