Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supports SmolLM #495

Merged
merged 1 commit into from
Jul 22, 2024
Merged

Supports SmolLM #495

merged 1 commit into from
Jul 22, 2024

Conversation

Stillerman
Copy link
Contributor

@Stillerman Stillerman commented Jul 21, 2024

These changes are needed to make a gguf for SmolLM work with llamafile. The gguf was generated with this PR for llama.cpp

Tested with

llamafile-convert smol-135M.gguf
./smol-135M.llamafile  

@jart jart self-requested a review July 22, 2024 06:24
@jart
Copy link
Collaborator

jart commented Jul 22, 2024

Wow thanks for sending this. I checked out your llama.cpp PR and I'm having trouble creating a GGUF file. Any ideas?

Traceback (most recent call last):
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 3673, in <module>
    main()
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 3666, in main
    model_instance.write()
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 400, in write
    self.prepare_metadata(vocab_only=False)
  File "/home/jart/llama.cpp/convert_hf_to_gguf.py", line 348, in prepare_metadata
    self.metadata = gguf.Metadata.load(self.metadata_override, self.dir_model, self.model_name, total_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 59, in load
    metadata = Metadata.apply_metadata_heuristic(metadata, model_card, hf_params, model_path, total_params)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 396, in apply_metadata_heuristic
    model_full_name_component, org_component, basename, finetune, version, size_label = Metadata.get_model_id_components(model_id, total_params)
                                                                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jart/llama.cpp/gguf-py/gguf/metadata.py", line 233, in get_model_id_components
    if at_start and ((len(t) == 0 and part[0].isalpha()) or "version" in t):
                                      ~~~~^^^
IndexError: string index out of range

I ran this command:

$?=1 main jart@luna:/fast/hf/SmolLM-135M$ ~/llama.cpp/convert_hf_to_gguf.py --outtype bf16 .

@Stillerman
Copy link
Contributor Author

Hmmm looking into this now

@Stillerman
Copy link
Contributor Author

Stillerman commented Jul 22, 2024

Could you try with a fresh copy of llama.cpp at d94c6e0 with the following environment

(justine) jts@Jasons-MacBook-Air justine % uv pip freeze
certifi==2024.7.4
charset-normalizer==3.3.2
filelock==3.15.4
fsspec==2024.6.1
huggingface-hub==0.24.0
idna==3.7
jinja2==3.1.4
markupsafe==2.1.5
mpmath==1.3.0
networkx==3.3
numpy==1.26.4
packaging==24.1
pyyaml==6.0.1
regex==2024.5.15
requests==2.32.3
safetensors==0.4.3
sentencepiece==0.2.0
sympy==1.13.1
tokenizers==0.19.1
torch==2.3.1
tqdm==4.66.4
transformers==4.42.4
typing-extensions==4.12.2
urllib3==2.2.2

download.py

from huggingface_hub import snapshot_download

# https://huggingface.co/HuggingFaceTB/SmolLM-135M
model_id="HuggingFaceTB/SmolLM-135M"
snapshot_download(repo_id=model_id, local_dir="smol-135",
                    local_dir_use_symlinks=False, revision="main")

and then run

python download.py
python convert_hf_to_gguf.py smol-135 --outtype bf16

I was able to then run

make -j8
./llama-cli -m "smol-135/smol-135M-135-BF16.gguf" -p "hi there llama\!"`

and it seems to inference

@jart
Copy link
Collaborator

jart commented Jul 22, 2024

Conversion works now. Although it's a little weird the filename that it chooses.

image

Copy link
Collaborator

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic. This model goes wicked fast on CPU.

llama_print_timings:        load time =      59.98 ms
llama_print_timings:      sample time =       1.16 ms /    30 runs   (    0.04 ms per token, 25862.07 tokens per second)
llama_print_timings: prompt eval time =      45.54 ms /   203 tokens (    0.22 ms per token,  4457.82 tokens per second)
llama_print_timings:        eval time =     185.74 ms /    29 runs   (    6.40 ms per token,   156.14 tokens per second)
llama_print_timings:       total time =     237.18 ms /   232 tokens
Log end
smol jart@luna:~/llamafile$ ls -hal /weights/SmolLM-135M.BF16.gguf
-rw-rw-r-- 1 jart jart 259M Jul 22 10:29 /weights/SmolLM-135M.BF16.gguf

Thank you! Approved!

@jart jart merged commit cc30400 into Mozilla-Ocho:main Jul 22, 2024
2 checks passed
@Stillerman
Copy link
Contributor Author

Llamafiles for all SmolLM models can be found here.

@jart
Copy link
Collaborator

jart commented Jul 23, 2024

Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants