-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement HooksMixin #917
Implement HooksMixin #917
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
ec59d6c
to
45953c4
Compare
840a41b
to
0bc7bae
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
793ae75
to
55f69d6
Compare
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
e2e tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We briefly looked at the implications of using hooks with FSDP - are we taking care of that already or through this PR?
@dsikka I consider that to be out of scope for this PR. I consider FSDP to be unsupported as of now, although this PR makes it easier to support FSDP in the future. Modifying a module's parameter requires being in special FSDP contexts. @torch.no_grad()
def pre_hook(module, _args):
# modifying both training and handle training states is required
with model._use_training_state(TrainingState.IDLE, HandleTrainingState.IDLE):
with FullyShardedDataParallel.summon_full_params(model):
# modify module weight. Doing so outside of the contexts will raise a non-contiguous tensor error
module.weight *= 0 We can bake these contexts into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good in cleaning up/unifying hooks
Current testing should test the changes with the QuantizationModifier
- do we think this is the case for the other modifiers being tested?
The other thought I had was about a less common but potentially useful use case where a modifier may have hooks for different cases and may want to target turning off a specific subset as opposed to all of them - do we think the hooks mixin class can be extended easily to handle that?
I've tested with the e2e tests, although I can perform more rigorous testing if we think that's necessary.
Yes! There are good arguments to be made for enabling this kind of functionality within the GPTQ algorithm, and unifying hooks makes implementing this functionality much easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest checking out the nightly test cases and making sure we're not running any issues there. LGTM.
oh ignore my nightly comment. |
Purpose
Changes
HooksMixin
_HOOKS_DISABLED
attribute is a global variable attached to the class which is used to disable hooks globally_hooks
attribute is a local variable attached to each modifier which lists all of the hooks created by that modifierQuantizationModifier
, refactor calibration functions to reference the same function rather than generating hook functionsSmoothQuantModifier
WandaPruningModifier
andSparseGPTModifier
MagnitudePruningModifier
andConstantPruningModifier
viaLayerParamMasking
LayerCompressor
since this will be handled by future data pipelines and doing so would all theBaseModel
inheritance to theLayerCompressor
class, which add unnecessary complexity to this PRTesting
tests/llmcompressor/modifiers/utils/test_hooks.py