Llama 3.2 1B Instruct Server Setup

Local FastAPI server using Llama 3.2 1B Instruct for response generation, accessible via REST API.

Install Required Packages:

Method 1: pip install -r requirements.txt
Method 2: pip install fastapi uvicorn accelerate transformers

You can get access from Meta to download Llama via Meta's HuggingFace repo

Install PyTorch:

If you are doing this on a CUDA-accelerated device, check your system's CUDA version: nvidia-smi
Visit https://pytorch.org/ and install the appropriate version. e.g. pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

To run the Llama server:

Run in one terminal: python service.py
Run in another terminal: tests/test_llm_remote.py
If you are running the uvicorn server on a different device, create an .env file in the root of your project folder and add the SERVER_URL variable e.g. SERVER_URL=http://192.168.1.250:10000
If you merely want to test Llama responses without setting up the server, run tests/test_llm_local.py

To add to parent repo:

# Add as a submodule
git submodule add https://github.com/IshanG97/llama_server.git llama_server

# Update when cloning parent repo for the first time
git submodule update --init --recursive

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
service.py		service.py
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama 3.2 1B Instruct Server Setup

About

Releases

Packages

Languages

License

IshanG97/llama_server

Folders and files

Latest commit

History

Repository files navigation

Llama 3.2 1B Instruct Server Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages