-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mac Metal : doc example still run in CPU #1064
Comments
Hi, @jochy thanks for your feedback. You are right, we should add |
Hey @Aisuko , I have tried to build it locally using the One thing that could help is to have a Github Action workflow which compiles the code using the EDIT : I tried to build the last commit on master with |
|
HI @jochy Is the CPU already in using? |
Hello Guys. llama-2-13b-chat.Q4_0.yaml (tried setting gpu_layers: 1 as per documentation and there is no difference) llama-2-13b-chat.Q4_0.tmpl {{else if .Content}}{{.Content}}{{end}} When i use this model from the text-generation-webui for example, I can see the GPU is being fully utilised. Please let me know what I am doing wrong? Cheers |
Sorry, spoke too soon. The indent in the yaml config was wrong. The gpu_layers: 24 shouldn't be under the Parameters, but a standalone option. Now it works with the GPU |
Which example you are using? I do not find the configuration |
You should use llama-cpp (not llama) as backend. This is a example: name: gpt-3.5-turbo
# Default model parameters
parameters:
# Relative to the models path
model: mistral-7b-code-16k-qlora.Q4_0.gguf
# temperature
temperature: 0.3
# all the OpenAI request options here..
gpu_layers: 1
f16: true
# Default context size
context_size: 512
threads: 10
# Define a backend (optional). By default it will try to guess the backend the first time the model is interacted with.
backend: llama-cpp
# available: llama, stablelm, gpt2, gptj rwkv
# Enable prompt caching
# prompt_cache_path: "alpaca-cache"
# prompt_cache_all: true
# stopwords (if supported by the backend)
stopwords:
- "HUMAN:"
- "### Response:"
# define chat roles
roles:
assistant: '### Response:'
system: '### System Instruction:'
user: '### Instruction:'
template:
# template file ".tmpl" with the prompt template to use by default on the endpoint call. Note there is no extension in the files
completion: xwin-completion
chat: xwin-chat
|
LocalAI version:
Last commit on master (8ccf5b2)
Environment, CPU architecture, OS, and Version:
Macbook M2 Max, 64Go Memory, Sonoma beta 7
Describe the bug
I have followed the documentation to build and run LocalAi with metal support. However, the example in the documentation still runs on the CPU.
To Reproduce
Follow the documentation : https://localai.io/basics/build/#build-on-mac
See that the CPU is used, not the GPU.
Expected behavior
GPU should be used
The text was updated successfully, but these errors were encountered: