Mac Metal : doc example still run in CPU #1064

jochy · 2023-09-15T14:59:04Z

LocalAI version:
Last commit on master (8ccf5b2)

Environment, CPU architecture, OS, and Version:
Macbook M2 Max, 64Go Memory, Sonoma beta 7

Describe the bug
I have followed the documentation to build and run LocalAi with metal support. However, the example in the documentation still runs on the CPU.

To Reproduce
Follow the documentation : https://localai.io/basics/build/#build-on-mac
See that the CPU is used, not the GPU.

Expected behavior
GPU should be used

The text was updated successfully, but these errors were encountered:

Aisuko · 2023-09-17T03:01:53Z

Hi, @jochy thanks for your feedback. You are right, we should add metal as a default build type for M1 and M2. Have you build localAI succeeded with the make BUILD_TYPE=metal build and run it?

jochy · 2023-09-23T13:01:20Z

Hey @Aisuko , I have tried to build it locally using the BUILD_TYPE=metal flag but it failed due to some compilation issues. I don't know if it's related to my local environment or if it's not working for everyone.

One thing that could help is to have a Github Action workflow which compiles the code using the BUILD_TYPE=metal flag.

EDIT : I tried to build the last commit on master with BUILD_TYPE=metal and it is now working. I'll check if it is now using the GPU.

localai-bot · 2023-09-25T12:20:32Z

⚠️⚠️⚠️⚠️⚠️

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful 😸 and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me,
I don't support (yet) replying!

⚠️⚠️⚠️⚠️⚠️

I'm sorry to hear that you are having trouble with building and running LocalAI on your Macbook M2 Max. I can see from the logs that you have followed the documentation to build and run LocalAI with metal support, but the example still runs on the CPU instead of the GPU.
To help debug this issue further, could you please provide me with the following information:

What version of macOS are you running?
What version of Xcode are you using?
Have you installed the necessary dependencies (Cmake and GCC) as mentioned in the documentation?
Are you able to successfully build the LocalAI container image locally? If so, what command did you use and what is the output?
When you run the example with the --gpus all flag, do you see any error messages or logs that might indicate why the GPU is not being used?
Once I have this information, I can help troubleshoot the issue further and provide proposing solutions or help in debugging the issue further.

Sources:

Aisuko · 2023-10-02T02:39:03Z

HI @jochy Is the CPU already in using?

mozg31337 · 2023-10-13T09:13:43Z

Hello Guys.
I am using LocalAI on MacBookPro M1 64gb (yesterdays' git pull). I've built using instructions for the M1/M2 build and it successfully compiled using BUILD_TYPE=metal . However, when i run models i still see the CPU usage and not the GPU. I've used the following model files:

llama-2-13b-chat.Q4_0.yaml
name: llama-2-13b-chat.Q4_0
parameters:
model: llama-2-13b-chat.Q4_0
gpu_layers: 24
f16: true
context_size: 4096
model_type: llama

(tried setting gpu_layers: 1 as per documentation and there is no difference)

llama-2-13b-chat.Q4_0.tmpl
{{if eq .RoleName "assistant"}}{{.Content}}{{else}}
[INST]
{{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<>{{.Content}}<>

{{else if .Content}}{{.Content}}{{end}}
[/INST]
{{end}}%

When i use this model from the text-generation-webui for example, I can see the GPU is being fully utilised. Please let me know what I am doing wrong? Cheers

mozg31337 · 2023-10-13T10:59:50Z

Hello Guys. I am using LocalAI on MacBookPro M1 64gb (yesterdays' git pull). I've built using instructions for the M1/M2 build and it successfully compiled using BUILD_TYPE=metal . However, when i run models i still see the CPU usage and not the GPU. I've used the following model files:

llama-2-13b-chat.Q4_0.yaml name: llama-2-13b-chat.Q4_0 parameters: model: llama-2-13b-chat.Q4_0 gpu_layers: 24 f16: true context_size: 4096 model_type: llama

(tried setting gpu_layers: 1 as per documentation and there is no difference)

llama-2-13b-chat.Q4_0.tmpl {{if eq .RoleName "assistant"}}{{.Content}}{{else}} [INST] {{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<>{{.Content}}<>

{{else if .Content}}{{.Content}}{{end}} [/INST] {{end}}%

When i use this model from the text-generation-webui for example, I can see the GPU is being fully utilised. Please let me know what I am doing wrong? Cheers

Sorry, spoke too soon. The indent in the yaml config was wrong. The gpu_layers: 24 shouldn't be under the Parameters, but a standalone option. Now it works with the GPU

Aisuko · 2023-10-16T00:16:18Z

Which example you are using? I do not find the configuration

loyaliu · 2023-11-04T13:23:55Z

You should use llama-cpp (not llama) as backend. This is a example:

name: gpt-3.5-turbo
# Default model parameters

parameters:
  # Relative to the models path
  model: mistral-7b-code-16k-qlora.Q4_0.gguf
  # temperature
  temperature: 0.3
  # all the OpenAI request options here..

gpu_layers: 1
f16: true
# Default context size
context_size: 512
threads: 10
# Define a backend (optional). By default it will try to guess the backend the first time the model is interacted with.
backend: llama-cpp
# available: llama, stablelm, gpt2, gptj rwkv

# Enable prompt caching
# prompt_cache_path: "alpaca-cache"
# prompt_cache_all: true

# stopwords (if supported by the backend)
stopwords:
- "HUMAN:"
- "### Response:"
# define chat roles
roles:
  assistant: '### Response:'
  system: '### System Instruction:'
  user: '### Instruction:'
template:
  # template file ".tmpl" with the prompt template to use by default on the endpoint call. Note there is no extension in the files
  completion: xwin-completion
  chat: xwin-chat

jochy added the bug Something isn't working label Sep 15, 2023

jochy assigned mudler Sep 15, 2023

Aisuko added kind/documentation Improvements or additions to documentation enhancement New feature or request and removed bug Something isn't working labels Sep 17, 2023

Aisuko unassigned mudler Sep 17, 2023

Aisuko self-assigned this Sep 25, 2023

mudler added bug Something isn't working and removed enhancement New feature or request labels Sep 30, 2023

mudler linked a pull request Nov 30, 2023 that will close this issue

fix: OSX Build Fix Part 1: Metal #1365

Merged

mudler closed this as completed in #1365 Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mac Metal : doc example still run in CPU #1064

Mac Metal : doc example still run in CPU #1064

jochy commented Sep 15, 2023

Aisuko commented Sep 17, 2023 •

edited

Loading

jochy commented Sep 23, 2023 •

edited

Loading

localai-bot commented Sep 25, 2023

Aisuko commented Oct 2, 2023 •

edited

Loading

mozg31337 commented Oct 13, 2023

mozg31337 commented Oct 13, 2023

Aisuko commented Oct 16, 2023

loyaliu commented Nov 4, 2023

Mac Metal : doc example still run in CPU #1064

Mac Metal : doc example still run in CPU #1064

Comments

jochy commented Sep 15, 2023

Aisuko commented Sep 17, 2023 • edited Loading

jochy commented Sep 23, 2023 • edited Loading

localai-bot commented Sep 25, 2023

⚠️⚠️⚠️⚠️⚠️

⚠️⚠️⚠️⚠️⚠️

Aisuko commented Oct 2, 2023 • edited Loading

mozg31337 commented Oct 13, 2023

mozg31337 commented Oct 13, 2023

Aisuko commented Oct 16, 2023

loyaliu commented Nov 4, 2023

Aisuko commented Sep 17, 2023 •

edited

Loading

jochy commented Sep 23, 2023 •

edited

Loading

Aisuko commented Oct 2, 2023 •

edited

Loading