-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0 add autoquant #402
base: main
Are you sure you want to change the base?
v0 add autoquant #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR introduces support for 'autoquant', a new automatic quantization feature in the Infinity project. The changes span multiple files and include implementation, documentation, and testing updates.
- Added 'autoquant' as a new option in the Dtype enum and CLI documentation, enabling automatic quantization for improved model performance
- Implemented 'autoquant' support in the SentenceTransformerPatched class and quantization interface
- Added 'torchao' dependency to pyproject.toml, likely to support the new autoquant functionality
- Created a new test function to verify the autoquant feature's effectiveness and accuracy
- Updated README with information on new multi-modal support (CLIP, CLAP) and text classification capabilities
9 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings
@@ -7,6 +7,7 @@ | |||
|
|||
import numpy as np | |||
import requests # type: ignore | |||
import torch.ao.quantization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: This import is unused in the current file. Consider removing it if not needed.
model = torch.quantization.quantize_dynamic( | ||
model.to("cpu"), # the original model | ||
{torch.nn.Linear}, # a set of layers to dynamically quantize | ||
dtype=torch.qint8, | ||
) | ||
model = torch.ao.quantization.quantize_dynamic( | ||
model, {torch.nn.Linear}, dtype=torch.qint8 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Two quantization methods are applied sequentially. This might lead to unexpected behavior or reduced model performance. Consider using only one method or clarify why both are necessary.
bettertransformer=False, | ||
) | ||
) | ||
sentence = "This is a test sentence." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: This line is unused and can be removed.
if __name__ == "__main__": | ||
test_autoquant_quantization() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Running a single test function in main might not be ideal. Consider using a test runner or removing this block if not necessary.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
- Coverage 79.01% 73.24% -5.77%
==========================================
Files 40 40
Lines 3173 3184 +11
==========================================
- Hits 2507 2332 -175
- Misses 666 852 +186 ☔ View full report in Codecov by Sentry. |
This pull request introduces several changes to the
infinity_emb
library, focusing on adding support for a newautoquant
data type, updating documentation, and improving the quantization process. The most important changes include adding theautoquant
data type, updating the CLI documentation, modifying quantization logic, and adding unit tests forautoquant
quantization.New Features:
autoquant
data type toDtype
enum inlibs/infinity_emb/infinity_emb/primitives.py
.autoquant
inlibs/infinity_emb/infinity_emb/transformer/quantization/interface.py
andlibs/infinity_emb/infinity_emb/transformer/quantization/quant.py
[1] [2].Documentation Updates:
autoquant
indocs/docs/cli_v2.md
.Codebase Improvements:
Makefile
to usepoetry run
for generating OpenAPI and CLI v2 documentation inlibs/infinity_emb/Makefile
[1] [2].Dependency Updates:
torchao
as an optional dependency inlibs/infinity_emb/pyproject.toml
[1] [2].Testing Enhancements:
autoquant
quantization inlibs/infinity_emb/tests/unit_test/transformer/quantization/test_interface.py
.