Running llama.cpp directly on iOS devices #4423

philippzagar · 2023-12-12T06:13:24Z

philippzagar
Dec 12, 2023

For my Master's thesis in the digital health field, I developed a Swift package that encapsulates llama.cpp, offering a streamlined and easy-to-use Swift API for developers. The SpeziLLM package, entirely open-source, is accessible within the Stanford Spezi ecosystem: StanfordSpezi/SpeziLLM (specifically, the SpeziLLMLocal target).

Internally, SpeziLLM leverages a precompiled XCFramework version of llama.cpp. We chose this approach as using llama.cpp via the provided Package.swift file in the repo requires the use of unsafeFlags(_:), which prevents semantic versioning via SPM as discussed in the Swift community forum and on StackOverflow. By compiling llama.cpp into an XCFramework and exposing it as a binaryTarget(_:) in SPM, we enable proper semantic versioning of the package. You can explore the complete source code and the respective GitHub Actions here: StanfordBDHG/llama.cpp.

I welcome any feedback on the implementation, particularly concerning the llama.cpp inference (take a closer look at this source file)

An example workflow utilizing the Llama 2 7B model running on an iPhone 15 Pro with 6GB of main memory looks like this:
(the SpeziLLM repo includes this example as a UI test application)

SpeziLLM.mp4

breakingwave · 2024-08-01T11:24:34Z

breakingwave
Aug 1, 2024

For anyone who would like to try this out, I made a fork that uses Phi3 3.8B as the default model. It is much smaller (2.2GB) and hence much faster for inference on weaker devices, but powerful enough to test most use cases. I’m making a simple nutrition-counting app, and it works just fine.

@philippzagar, in case you still support this project, I think that Phi3 or another smaller model would be a better default option.

0 replies

ptrkstr · 2024-10-03T06:22:51Z

ptrkstr
Oct 3, 2024

@philippzagar this is a fantastic project, you've nailed both the simplicity for developers to integrate with llama.cpp and users to get a great UX chatting with an LLM. I was curious to know if you plan on continuing to maintain this (i.e. https://github.com/StanfordBDHG/llama.cpp synced with upstream).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running llama.cpp directly on iOS devices #4423

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Running llama.cpp directly on iOS devices #4423

philippzagar Dec 12, 2023

Replies: 2 comments

breakingwave Aug 1, 2024

ptrkstr Oct 3, 2024

philippzagar
Dec 12, 2023

breakingwave
Aug 1, 2024

ptrkstr
Oct 3, 2024