Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac Metal : doc example still run in CPU #1064

Closed
jochy opened this issue Sep 15, 2023 · 8 comments · Fixed by #1365
Closed

Mac Metal : doc example still run in CPU #1064

jochy opened this issue Sep 15, 2023 · 8 comments · Fixed by #1365
Assignees
Labels
bug Something isn't working kind/documentation Improvements or additions to documentation

Comments

@jochy
Copy link

jochy commented Sep 15, 2023

LocalAI version:
Last commit on master (8ccf5b2)

Environment, CPU architecture, OS, and Version:
Macbook M2 Max, 64Go Memory, Sonoma beta 7

Describe the bug
I have followed the documentation to build and run LocalAi with metal support. However, the example in the documentation still runs on the CPU.

To Reproduce
Follow the documentation : https://localai.io/basics/build/#build-on-mac
See that the CPU is used, not the GPU.

Expected behavior
GPU should be used

@jochy jochy added the bug Something isn't working label Sep 15, 2023
@Aisuko Aisuko added kind/documentation Improvements or additions to documentation enhancement New feature or request and removed bug Something isn't working labels Sep 17, 2023
@Aisuko
Copy link
Collaborator

Aisuko commented Sep 17, 2023

Hi, @jochy thanks for your feedback. You are right, we should add metal as a default build type for M1 and M2. Have you build localAI succeeded with the make BUILD_TYPE=metal build and run it?

@jochy
Copy link
Author

jochy commented Sep 23, 2023

Hey @Aisuko , I have tried to build it locally using the BUILD_TYPE=metal flag but it failed due to some compilation issues. I don't know if it's related to my local environment or if it's not working for everyone.

One thing that could help is to have a Github Action workflow which compiles the code using the BUILD_TYPE=metal flag.

EDIT : I tried to build the last commit on master with BUILD_TYPE=metal and it is now working. I'll check if it is now using the GPU.

@Aisuko Aisuko self-assigned this Sep 25, 2023
@localai-bot
Copy link
Contributor

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I'm sorry to hear that you are having trouble with building and running LocalAI on your Macbook M2 Max. I can see from the logs that you have followed the documentation to build and run LocalAI with metal support, but the example still runs on the CPU instead of the GPU.
To help debug this issue further, could you please provide me with the following information:

  1. What version of macOS are you running?
  2. What version of Xcode are you using?
  3. Have you installed the necessary dependencies (Cmake and GCC) as mentioned in the documentation?
  4. Are you able to successfully build the LocalAI container image locally? If so, what command did you use and what is the output?
  5. When you run the example with the --gpus all flag, do you see any error messages or logs that might indicate why the GPU is not being used?
    Once I have this information, I can help troubleshoot the issue further and provide proposing solutions or help in debugging the issue further.

Sources:

@mudler mudler added bug Something isn't working and removed enhancement New feature or request labels Sep 30, 2023
@Aisuko
Copy link
Collaborator

Aisuko commented Oct 2, 2023

HI @jochy Is the CPU already in using?

@mozg31337
Copy link

Hello Guys.
I am using LocalAI on MacBookPro M1 64gb (yesterdays' git pull). I've built using instructions for the M1/M2 build and it successfully compiled using BUILD_TYPE=metal . However, when i run models i still see the CPU usage and not the GPU. I've used the following model files:

llama-2-13b-chat.Q4_0.yaml
name: llama-2-13b-chat.Q4_0
parameters:
model: llama-2-13b-chat.Q4_0
gpu_layers: 24
f16: true
context_size: 4096
model_type: llama

(tried setting gpu_layers: 1 as per documentation and there is no difference)

llama-2-13b-chat.Q4_0.tmpl
{{if eq .RoleName "assistant"}}{{.Content}}{{else}}
[INST]
{{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<>{{.Content}}<>

{{else if .Content}}{{.Content}}{{end}}
[/INST]
{{end}}%

When i use this model from the text-generation-webui for example, I can see the GPU is being fully utilised. Please let me know what I am doing wrong? Cheers

@mozg31337
Copy link

Hello Guys. I am using LocalAI on MacBookPro M1 64gb (yesterdays' git pull). I've built using instructions for the M1/M2 build and it successfully compiled using BUILD_TYPE=metal . However, when i run models i still see the CPU usage and not the GPU. I've used the following model files:

llama-2-13b-chat.Q4_0.yaml name: llama-2-13b-chat.Q4_0 parameters: model: llama-2-13b-chat.Q4_0 gpu_layers: 24 f16: true context_size: 4096 model_type: llama

(tried setting gpu_layers: 1 as per documentation and there is no difference)

llama-2-13b-chat.Q4_0.tmpl {{if eq .RoleName "assistant"}}{{.Content}}{{else}} [INST] {{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<>{{.Content}}<>

{{else if .Content}}{{.Content}}{{end}} [/INST] {{end}}%

When i use this model from the text-generation-webui for example, I can see the GPU is being fully utilised. Please let me know what I am doing wrong? Cheers

Sorry, spoke too soon. The indent in the yaml config was wrong. The gpu_layers: 24 shouldn't be under the Parameters, but a standalone option. Now it works with the GPU

@Aisuko
Copy link
Collaborator

Aisuko commented Oct 16, 2023

Which example you are using? I do not find the configuration

@loyaliu
Copy link

loyaliu commented Nov 4, 2023

You should use llama-cpp (not llama) as backend. This is a example:

name: gpt-3.5-turbo
# Default model parameters

parameters:
  # Relative to the models path
  model: mistral-7b-code-16k-qlora.Q4_0.gguf
  # temperature
  temperature: 0.3
  # all the OpenAI request options here..

gpu_layers: 1
f16: true
# Default context size
context_size: 512
threads: 10
# Define a backend (optional). By default it will try to guess the backend the first time the model is interacted with.
backend: llama-cpp
# available: llama, stablelm, gpt2, gptj rwkv

# Enable prompt caching
# prompt_cache_path: "alpaca-cache"
# prompt_cache_all: true

# stopwords (if supported by the backend)
stopwords:
- "HUMAN:"
- "### Response:"
# define chat roles
roles:
  assistant: '### Response:'
  system: '### System Instruction:'
  user: '### Instruction:'
template:
  # template file ".tmpl" with the prompt template to use by default on the endpoint call. Note there is no extension in the files
  completion: xwin-completion
  chat: xwin-chat

@mudler mudler linked a pull request Nov 30, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working kind/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants