A complete LLM ChatBot UI and server, all running on your laptop (tested on macOS):
graph LR;
subgraph laptop [Your Laptop Environment]
ChatBot[ChatBot UI] -- HTTP --> LLM[LLM Web Server];
end
- Runs a ChatGPT-like LLM (e.g. Llama 2) locally (using llama-cpp-python)
- Runs a ChatGPT-like UI/app locally (using chainlit)
This loosely follows the setup steps from https://llama-cpp-python.readthedocs.io/en/latest/install/macos/.
This code has been tested on Python version 3.9.16. Pyenv is a great way to install Python.
Set up a virtual environment and install Python requirements:
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
Model download command:
huggingface-cli download \
TheBloke/Llama-2-7B-Chat-GGUF \
llama-2-7b-chat.Q5_K_M.gguf \
--local-dir ./models/llama-7b-chat/ \
--local-dir-use-symlinks False
This model file format (GGUF) is used for running LLM inference on macOS. To use a different model:
- Update the above download command to use a different huggingface repo and filename (must be in GGUF format).
- Update the
run-server.sh
script'sMODEL
variable with the path to the.gguf
file after download.
In one terminal, activate the virtualenv and run the model server with:
./run-server.sh
This will run the Llama 7B chat model inference server (by default on port 8000).
In another terminal, activate the virtualenv and run the UI with:
./run-ui.sh
This will open a ChatGPT-like UI in your browser using Chainlit (by default on port 8001), connected to the inference server.
It may be necessary to run the llama_cpp.server
with an override configured for LLAMA_CPP_LIB
to the path of the .dylib
file created in the environment by the llama-cpp-python
package.
This can be found by running find /path/to/venv -name '*.dylib'
in your venv (or Conda env).
Attempting to build the llama C++ from scratch using https://github.com/ggerganov/llama.cpp resulted in
a segmentation fault for me when running the llama_cpp.server
module, so the library override
using a pre-built library file seems to be more stable.