Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running it on mac #88

Closed
milsun opened this issue Aug 16, 2024 · 5 comments
Closed

running it on mac #88

milsun opened this issue Aug 16, 2024 · 5 comments

Comments

@milsun
Copy link

milsun commented Aug 16, 2024

how can i run it on my mac m2, llama.cpp doesnt support this as yet?

@zqhuang211
Copy link
Contributor

zqhuang211 commented Aug 19, 2024

Currently, inference works on Apple Silicon (cpu or mps), though it’s significantly slower compared to Nvidia GPUs.

If you’d like to try out the Gradio demo on a Mac, simply run just gradio. By default, this will use mps for inference.

Training with --device=mps still need some work, see #91

@zqhuang211
Copy link
Contributor

Ultravox doesn’t support llama.cpp yet. We plan to explore this in the future and definitely welcome contributions from the community.

@milsun
Copy link
Author

milsun commented Aug 20, 2024

running it with "mps" would be fp32 I guess, so would need a ton of RAM, right, like ~32GB or so?

@zqhuang211
Copy link
Contributor

Yes for float32. You can also set --data_type=float16, which reduces RAM usage to around 20GB. We haven’t tried quantization yet.

@juberti
Copy link
Contributor

juberti commented Aug 21, 2024

Fixed in #95

@juberti juberti closed this as completed Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants