Skip to content

Local FastAPI server using Llama 3.2 1B to generate responses, accessible locally or through REST API calls

License

Notifications You must be signed in to change notification settings

IshanG97/llama_server

Repository files navigation

Llama 3.2 1B Instruct Server Setup

Local FastAPI server using Llama 3.2 1B Instruct for response generation, accessible via REST API.

Install Required Packages:

  • Method 1: pip install -r requirements.txt

  • Method 2: pip install fastapi uvicorn accelerate transformers

You can get access from Meta to download Llama via Meta's HuggingFace repo

Install PyTorch:

  1. If you are doing this on a CUDA-accelerated device, check your system's CUDA version: nvidia-smi

  2. Visit https://pytorch.org/ and install the appropriate version. e.g. pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

To run the Llama server:

  1. Run in one terminal: python service.py
  2. Run in another terminal: tests/test_llm_remote.py
  3. If you are running the uvicorn server on a different device, create an .env file in the root of your project folder and add the SERVER_URL variable e.g. SERVER_URL=http://192.168.1.250:10000
  4. If you merely want to test Llama responses without setting up the server, run tests/test_llm_local.py

To add to parent repo:

# Add as a submodule
git submodule add https://github.com/IshanG97/llama_server.git llama_server

# Update when cloning parent repo for the first time
git submodule update --init --recursive

About

Local FastAPI server using Llama 3.2 1B to generate responses, accessible locally or through REST API calls

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages