-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supports SmolLM #495
Supports SmolLM #495
Conversation
Wow thanks for sending this. I checked out your llama.cpp PR and I'm having trouble creating a GGUF file. Any ideas?
I ran this command:
|
Hmmm looking into this now |
Could you try with a fresh copy of llama.cpp at
and then run
I was able to then run
and it seems to inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic. This model goes wicked fast on CPU.
llama_print_timings: load time = 59.98 ms
llama_print_timings: sample time = 1.16 ms / 30 runs ( 0.04 ms per token, 25862.07 tokens per second)
llama_print_timings: prompt eval time = 45.54 ms / 203 tokens ( 0.22 ms per token, 4457.82 tokens per second)
llama_print_timings: eval time = 185.74 ms / 29 runs ( 6.40 ms per token, 156.14 tokens per second)
llama_print_timings: total time = 237.18 ms / 232 tokens
Log end
smol jart@luna:~/llamafile$ ls -hal /weights/SmolLM-135M.BF16.gguf
-rw-rw-r-- 1 jart jart 259M Jul 22 10:29 /weights/SmolLM-135M.BF16.gguf
Thank you! Approved!
Llamafiles for all SmolLM models can be found here. |
Nice! |
These changes are needed to make a gguf for SmolLM work with llamafile. The gguf was generated with this PR for llama.cpp
Tested with