-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tools] Add convert tool for Llama models quantized by AutoGPTQ #276
Conversation
It's preferable to an additional param for quantification rather than creating a new converter, as this approach allows us to integrate new quantification methods in the future, like AWQ |
And update the description of Converter in README. |
How about |
Yes, sure. I will do that in a new pull request for quantization document. |
if self.wbits == 8: | ||
qweight = qweight - 128 # uint8 to int8 | ||
else: | ||
# pack 2 uint4 to 1 int8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint4x2
config["llama"]["end_id"] = str(hf_config["eos_token_id"]) | ||
config["llama"]["weight_data_type"] = dtype | ||
|
||
config["llama"]["quant_decoder_weights"] = str(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quant_decoder_weights -> quant_weight_data_type
int8,uint8,int4,uint4,fp8,fp4
@@ -53,16 +53,22 @@ def get_weight_data_type(self, dtype: str): | |||
else: | |||
raise Exception(f"{self.__class__.__name__} don't support convert weight to {dtype}.") | |||
|
|||
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8): | |||
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, quantization=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# from_quantized_model is from HF int8 model to xFT int8 model.
def convert(self, input_dir, output_dir=None, dtype: str = "fp16", processes=8, from_quantized_model=None):
""" | ||
|
||
if quantization != "gptq": | ||
print(f"[ERROR] Quantization method {quantization} is not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ERROR] Please input model must be {quantization} quantized model.
if not os.path.exists(output_dir): | ||
os.makedirs(output_dir) | ||
|
||
# load the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# load the quantized model, do not do GPTQ quantization.
elif "scales" in name: | ||
model_named_parameters[name] = param.float() | ||
elif "qzeros" in name: | ||
qzeros = param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zeros is uint8/uint4 from QPTQ quantization, and need to convert to fp32 for xFT.
elif "lm_head" in name: | ||
model_named_parameters[name] = param | ||
elif "scales" in name: | ||
model_named_parameters[name] = param.float() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scales is fp16/fp32/bf16 from QPTQ quantization, and need to convert to fp32 for xFT.
zeros = - scales * qzeros | ||
model_named_parameters[name] = zeros | ||
elif "qweight" in name: | ||
# qweight is not transposed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convert uint8 qweight from QPTQ quantization, and need to convert to int8 for xFT.
convert uint4 qweight from QPTQ quantization, and need to convert to uint4x2 for xFT.
@xiangzez CI failed on baichuan model? |
We could rebase to check this PR's status. |
No description provided.