Running llama.cpp directly on iOS devices #4423
Replies: 2 comments
-
For anyone who would like to try this out, I made a fork that uses Phi3 3.8B as the default model. It is much smaller (2.2GB) and hence much faster for inference on weaker devices, but powerful enough to test most use cases. I’m making a simple nutrition-counting app, and it works just fine. @philippzagar, in case you still support this project, I think that Phi3 or another smaller model would be a better default option. |
Beta Was this translation helpful? Give feedback.
-
@philippzagar this is a fantastic project, you've nailed both the simplicity for developers to integrate with llama.cpp and users to get a great UX chatting with an LLM. I was curious to know if you plan on continuing to maintain this (i.e. https://github.com/StanfordBDHG/llama.cpp synced with upstream). |
Beta Was this translation helpful? Give feedback.
-
For my Master's thesis in the digital health field, I developed a Swift package that encapsulates llama.cpp, offering a streamlined and easy-to-use Swift API for developers. The SpeziLLM package, entirely open-source, is accessible within the Stanford Spezi ecosystem: StanfordSpezi/SpeziLLM (specifically, the
SpeziLLMLocal
target).Internally, SpeziLLM leverages a precompiled XCFramework version of llama.cpp. We chose this approach as using llama.cpp via the provided Package.swift file in the repo requires the use of
unsafeFlags(_:)
, which prevents semantic versioning via SPM as discussed in the Swift community forum and on StackOverflow. By compiling llama.cpp into an XCFramework and exposing it as abinaryTarget(_:)
in SPM, we enable proper semantic versioning of the package. You can explore the complete source code and the respective GitHub Actions here: StanfordBDHG/llama.cpp.I welcome any feedback on the implementation, particularly concerning the llama.cpp inference (take a closer look at this source file)
An example workflow utilizing the Llama 2 7B model running on an iPhone 15 Pro with 6GB of main memory looks like this:
(the SpeziLLM repo includes this example as a UI test application)
SpeziLLM.mp4
Beta Was this translation helpful? Give feedback.
All reactions