running it on mac #88

milsun · 2024-08-16T12:43:34Z

how can i run it on my mac m2, llama.cpp doesnt support this as yet?

zqhuang211 · 2024-08-19T15:25:20Z

Currently, inference works on Apple Silicon (cpu or mps), though it’s significantly slower compared to Nvidia GPUs.

If you’d like to try out the Gradio demo on a Mac, simply run just gradio. By default, this will use mps for inference.

Training with --device=mps still need some work, see #91

zqhuang211 · 2024-08-19T15:27:02Z

Ultravox doesn’t support llama.cpp yet. We plan to explore this in the future and definitely welcome contributions from the community.

milsun · 2024-08-20T06:35:04Z

running it with "mps" would be fp32 I guess, so would need a ton of RAM, right, like ~32GB or so?

zqhuang211 · 2024-08-20T18:16:25Z

Yes for float32. You can also set --data_type=float16, which reduces RAM usage to around 20GB. We haven’t tried quantization yet.

juberti · 2024-08-21T22:20:06Z

Fixed in #95

juberti closed this as completed Aug 21, 2024

Provide feedback