Tutorial: How to convert HuggingFace model to GGUF format #2948
Replies: 34 comments 54 replies
-
You might want to add a small note that requantizing to other formats from |
Beta Was this translation helpful? Give feedback.
-
I have a model trained using Qlora and I can only convert it to min. 8-bit quantization using GGUF. What about q4_K_S quantization why are they not available? |
Beta Was this translation helpful? Give feedback.
-
Can anyone help me debug this? |
Beta Was this translation helpful? Give feedback.
-
Is there a way to directly do this on colab? |
Beta Was this translation helpful? Give feedback.
-
This way i can only get one file such ass gguf. Is it available to convert model in reproducable format like TheBloke in huggingface? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hi @samos123 I'm only used to working with .gguf kind of files for LLM, I have no idea what to do with this kind of models and so did a search and found your post. Am I right to assume all models structured this way are hf models? Is there any where I can read more about this? It seems all Youtube go straight to the quantized version .gguf. Are hf models considered the raw models that can be further tuned into something else? I have lots of assumptions but hard to verify. |
Beta Was this translation helpful? Give feedback.
-
Please tell me the difference between the roles of the following files.
My predictions are as follows.
Why aren't Also, only |
Beta Was this translation helpful? Give feedback.
-
Improved the download.py script:
This way you can just pass the model name on huggingface in the command line. It will remove the slash and replace it with a dash when creating the directory. Example:
|
Beta Was this translation helpful? Give feedback.
-
I'm having a |
Beta Was this translation helpful? Give feedback.
-
Hi, I ran into an odd error and was really struggling to find any relevant information online. Hoping someone here can help. I know almost nothing about the technical side of things, just an average AI text gen user. I'm trying to convert GGUFs for models and checked out instructions both here and this guide on Reddit: I managed to get convert.py working, can do FP16 and Q8 converts without issue, but ran into the same mysterious error repeatedly when trying to use quantize.exe to convert pretty much anything. I've tried with both this model Mixtral Erotic and this model CatPPT The error message is always the same:
The processing always gets stuck on "line: 1 char:19", I'm not sure why and I can't really see what character it is specifically. BtW, I'm running in Powershell, just right clicked on the quantize.exe under Explorer and chose the option to auto navigate to that location. I'm not sure if that makes a difference. I'm wondering if the error is because I don't have Llama.cpp installed correctly. Running quantize.exe through CMD gives an error about cudart64_12.dll missing, but downloading and putting the cudart files into the same folder doesn't stop the error. If I'm only using convert.py and quantize .exe, do I still need to follow the Cmake instructions on the Llama.cpp main page to "build Llama" from the source code? I've already ran the requirements.txt through pythonnkich is why convert.py is working for me, I think. It's just for some reason quantize.exe doesn't work. Edit (Update): |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
As the errors state, you are mixing multiple models Please properly download files from HF microsoft/phi-2. Note: you can directly download GGUF quantized Microsoft Phi-2 models from HF with hf.sh, example for a Q4_K_M: ./scripts/hf.sh --repo TheBloke/phi-2-GGUF --file phi-2.Q4_K_M.gguf |
Beta Was this translation helpful? Give feedback.
-
This might be useful. If anyone wants to help improving it, it's always welcome. https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script |
Beta Was this translation helpful? Give feedback.
-
This
Helped me a ton. It downloaded my LoRA combined with the base model correctly. I was able to make my guff easily. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, I hope someone can help me with this error I am getting. I try running the below: However, I get an error of : Anyone have any ideas? Is this a problem with the model I am trying to convert? |
Beta Was this translation helpful? Give feedback.
-
In the materials I have researched, models trained using frameworks such as PyTorch and TensorFlow, in formats like Or can the convert.py script in llama.cpp directly convert models in formats like |
Beta Was this translation helpful? Give feedback.
-
Hi! It looks like torch does not support LFS, but I know such errors can be misleading... However - what the issue is, and does anyone know how to resolve? |
Beta Was this translation helpful? Give feedback.
-
I tried to convert a llama-3-8b fine-tune (with axolotl) to gguf but got this error, anyone knows how to solve this? INFO:convert:Loading model file /content/drive/My Drive/Colab Notebooks/output_new_run2/merged/model-00001-of-00004.safetensors
INFO:convert:Loading model file /content/drive/My Drive/Colab Notebooks/output_new_run2/merged/model-00001-of-00004.safetensors
INFO:convert:Loading model file /content/drive/My Drive/Colab Notebooks/output_new_run2/merged/model-00002-of-00004.safetensors
INFO:convert:Loading model file /content/drive/My Drive/Colab Notebooks/output_new_run2/merged/model-00003-of-00004.safetensors
INFO:convert:Loading model file /content/drive/My Drive/Colab Notebooks/output_new_run2/merged/model-00004-of-00004.safetensors
INFO:convert:params = Params(n_vocab=128259, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyQ8_0: 7>, path_model=PosixPath('/content/drive/My Drive/Colab Notebooks/output_new_run2/merged'))
Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1584, in <module>
main()
File "/content/llama.cpp/convert.py", line 1541, in main
vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
File "/content/llama.cpp/convert.py", line 1430, in load_vocab
vocab = self._create_vocab_by_path(vocab_types)
File "/content/llama.cpp/convert.py", line 1420, in _create_vocab_by_path
raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft'] |
Beta Was this translation helpful? Give feedback.
-
Hey hey, everyone. I'm VB (GPU Poor @ Hugging Face). I just wanted to share that you can also create your quants using the GGUF-my-repo space in the ggml.ai org. The space is powered by a powerful 16vCPU + 128GB RAM machine plus benefits from HF's collocated storage infrastructure. This makes downloading the repositories on the Hub, creating quants and uploading them back up relatively easy. All the code used to make the quants is open (we essentially wrap llama.cpp quantise). This way, you offload all your computing worries over to the space. We're looking for ways to improve the overall experience for creating quants, please do feel free to submit issues/ feedback/ pull requests. Democratising quants one quant type at a time ^^ ref: https://huggingface.co/spaces/ggml-org/gguf-my-repo Or, look at this tweet by @ggerganov: https://x.com/ggerganov/status/1776305900858265945 P.S. You can also create private quants with the space. 🤗 |
Beta Was this translation helpful? Give feedback.
-
Thx for the tutorial. I have tried to follow the instruction and success. I tried to convert this model indonlp/cendol-llama2-13b-merged-chat. I upload it into my repo and tested it with llama.cpp:
But I always get:
I tried it with different model in HF, I also tried with q8, f16 and f32 but all give the same error. I also tried using convert.py and convert-hf-to-gguf.py but still the same error when I run it. Anyone know what caused this error? Does the problem caused by conversion or the way I use it to inference? |
Beta Was this translation helpful? Give feedback.
-
For everyone other than me who is having trouble importing distutils: |
Beta Was this translation helpful? Give feedback.
-
notes:
llama.cpp/convert-hf-to-gguf-update.py Lines 63 to 86 in d6ef0e7 |
Beta Was this translation helpful? Give feedback.
-
New to this, am trying to convert an embedding model (https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5) to gguf format. When I tried using It seems to stem from here: llama.cpp/convert-hf-to-gguf.py Line 3118 in 1c5eba6 Any idea what's the issue and fix here? |
Beta Was this translation helpful? Give feedback.
-
Hi, not only me, but someone else had t his problem... is it me, or does it seems like BERT models are not supported, so they can't be converted to GGUF format? |
Beta Was this translation helpful? Give feedback.
-
I have this error message. how do I fix it?
|
Beta Was this translation helpful? Give feedback.
-
I have deleted the code but I remember that I failed with that method. At
the end I just at HuggingFace pages for online converter gguf
Pada Sel, 6 Agu 2024 13.43, gavin-edward ***@***.***> menulis:
… Thank you very much for your help. After building I ran quantize with:
quantize models/susnato_phi-1_5.gguf models/susnato_phi-1_5_q8_0.gguf Q8_0
And it works nicely. Cheers!
hello, could you please share your building method with me ?
—
Reply to this email directly, view it on GitHub
<#2948 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATWVRZLPFXE3MG23L3QKLMLZQBWC7AVCNFSM6AAAAAA4G4QWYKVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMRVGAYTSOA>
.
You are receiving this because you commented.Message ID: <ggerganov/llama.
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I can't convert any models that classify tokens. What's wrong? |
Beta Was this translation helpful? Give feedback.
-
Source: https://www.substratus.ai/blog/converting-hf-model-gguf-model/
I published this on our blog but though others here might benefit as well, so sharing the raw blog here on Github too. Hope it's helpful to folks here and feedback is welcome.
Downloading a HuggingFace model
There are various ways to download models, but in my experience the
huggingface_hub
library has been the most reliable. The
git clone
method occasionally results inOOM errors for large models.
Install the
huggingface_hub
library:Create a Python script named
download.py
with the following content:Run the Python script:
You should now have the model downloaded to a directory called
vicuna-hf
. Verify by running:Converting the model
Now it's time to convert the downloaded HuggingFace model to a GGUF model.
Llama.cpp comes with a converter script to do this.
Get the script by cloning the llama.cpp repo:
Install the required python libraries:
Verify the script is there and understand the various options:
Convert the HF model to GGUF model:
In this case we're also quantizing the model to 8 bit by setting
--outtype q8_0
. Quantizing helps improve inference speed, but it cannegatively impact quality.
You can use
--outtype f16
(16 bit) or--outtype f32
(32 bit) to preserve originalquality.
Verify the GGUF model was created:
Pushing the GGUF model to HuggingFace
You can optionally push back the GGUF model to HuggingFace.
Create a Python script with the filename
upload.py
thathas the following content:
Get a HuggingFace Token that has write permission from here:
https://huggingface.co/settings/tokens
Set your HuggingFace token:
Run the
upload.py
script:Beta Was this translation helpful? Give feedback.
All reactions