You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issue: Converting Compressed LLaMA2 Model to Hugging Face-Compatible Format
Description
We have successfully compressed a LLaMA2 model with 4.4 billion parameters. However, I am encountering issues when trying to convert the compressed model to a Hugging Face-compatible format. Specifically, when I use the model.save_pretrained(output_dir) and tokenizer.save_pretrained(output_dir) methods, the model parameters revert to the original 6.7 billion, and the output becomes worse and incoherent.
Steps to Reproduce
Compress a LLaMA2 model to 4.4 billion parameters.
Use the following code to save the model:
importtorchfromtransformersimportAutoModelForCausalLM, AutoTokenizerdefsave_compressed_model(model, tokenizer, output_dir):
# Save the model and tokenizer using Hugging Face's save_pretrained methodmodel.save_pretrained(output_dir, safe_serialization=True)
tokenizer.save_pretrained(output_dir)
# Load your compressed modelmodel_path="path_to_your_compressed_model"tokenizer_path="path_to_your_tokenizer"output_dir="path_to_output_directory"model=torch.load(model_path)
tokenizer=AutoTokenizer.from_pretrained(tokenizer_path)
# Save the model and tokenizersave_compressed_model(model, tokenizer, output_dir)
Attempt to use the model from the output directory.
Observed Behavior
The model parameters revert to the original 6.7 billion.
The model output becomes worse and generates random gibberish.
Expected Behavior
The model should retain its compressed state with 4.4 billion parameters.
The model output should remain coherent and consistent with the compressed model's performance.
Additional Context
I have also attempted to convert the model to GGUF format, but encountered similar issues. Any guidance on correctly converting and saving the compressed model for Hugging Face would be greatly appreciated.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered:
Just to simplify, we are able compress and use the svdllm models. However, we are unable to convert them to Hugging Face formats like safetensors or GGUF. All our conversion attempts have resulted in the models getting distorted or modified. Can you please help us figure this out?
Issue: Converting Compressed LLaMA2 Model to Hugging Face-Compatible Format
Description
We have successfully compressed a LLaMA2 model with 4.4 billion parameters. However, I am encountering issues when trying to convert the compressed model to a Hugging Face-compatible format. Specifically, when I use the
model.save_pretrained(output_dir)
andtokenizer.save_pretrained(output_dir)
methods, the model parameters revert to the original 6.7 billion, and the output becomes worse and incoherent.Steps to Reproduce
Compress a LLaMA2 model to 4.4 billion parameters.
Use the following code to save the model:
Attempt to use the model from the output directory.
Observed Behavior
Expected Behavior
Additional Context
I have also attempted to convert the model to GGUF format, but encountered similar issues. Any guidance on correctly converting and saving the compressed model for Hugging Face would be greatly appreciated.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: