-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPTQ Quantized-weight Sequential Updating #177
Conversation
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very clear from this PR in how you're differing between the layer or block cases you're adding in support for
I think we should want to keep the current non-sequential implementation (option 1), because it is faster by ~20%. Does option 4 provide any accuracy, memory or speed boost compared to option 3? I know AutoGPTQ supports option 4, but unless we have a strong reason I don't think its priority to add it in now. As for the change from a boolean to an enum, I don't think its necessary until we decide what to do about option 4, can we leave it as a bool for now? No need to add extra scaffolding for something that isn't on the roadmap |
Not sure what you're referring to. The PR description specifies implementations for options 1 and 3 |
We shouldn’t be adding it to the enum until it’s supported. We should keep a bool for now |
Re-performed replication
Evaluated sequential
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, could you add in eval stats for sequential_update=True to the PR description as well? Just so we can easily track any regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, looks like there is a test failure related to this PR:
FAILED tests/llmcompressor/transformers/obcq/test_consecutive_runs.py::TestConsecutiveRunsSmall_0_commit::test_consecutive_runs_small - pydantic_core._pydantic_core.ValidationError: 1 validation error for GPTQModifier
sequential_update
Input should be a valid boolean [type=bool_type, input_value=None, input_type=NoneType]
@Satrat My mistake, forgot to return the value after validation |
Looks like the test is passing now |
Background
There are many ways in which sequential updating can be implemented
Purpose
Changes
initialize_compression
(has no effect)LayerCompressor
Testing
sequential_update=True
andsequential_update=False
sequential_update=False
andsequential_update=False
on main and found the model evaluations to be the sameThis branch
sequential_update=False
main
sequential_update=False