Local FastAPI server using Llama 3.2 1B Instruct for response generation, accessible via REST API.
Install Required Packages:
-
Method 1:
pip install -r requirements.txt
-
Method 2:
pip install fastapi uvicorn accelerate transformers
You can get access from Meta to download Llama via Meta's HuggingFace repo
Install PyTorch:
-
If you are doing this on a CUDA-accelerated device, check your system's CUDA version:
nvidia-smi
-
Visit https://pytorch.org/ and install the appropriate version. e.g.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
To run the Llama server:
- Run in one terminal:
python service.py
- Run in another terminal:
tests/test_llm_remote.py
- If you are running the
uvicorn
server on a different device, create an.env
file in the root of your project folder and add theSERVER_URL
variable e.g.SERVER_URL=http://192.168.1.250:10000
- If you merely want to test Llama responses without setting up the server, run
tests/test_llm_local.py
To add to parent repo:
# Add as a submodule
git submodule add https://github.com/IshanG97/llama_server.git llama_server
# Update when cloning parent repo for the first time
git submodule update --init --recursive