Name		Name	Last commit message	Last commit date
parent directory ..
chart		chart
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
vllm-values.yaml		vllm-values.yaml
zarf.yaml		zarf.yaml

README.md

LeapfrogAI vLLM Backend

A LeapfrogAI API-compatible vllm wrapper for quantized and un-quantized model inferencing across GPU infrastructures.

Usage

Pre-Requisites

See the LeapfrogAI documentation website for system requirements and dependencies.

Dependent Components

LeapfrogAI API for a fully RESTful application

Model Selection

The default model that comes with this backend in this repository's officially released images is a 4-bit quantization of the Synthia-7b model.

You can optionally specify different models or quantization types using the following Docker build arguments:

--build-arg HF_HUB_ENABLE_HF_TRANSFER="1": Enable or disable HuggingFace Hub transfer (default: 1)
--build-arg REPO_ID="TheBloke/Synthia-7B-v2.0-GPTQ": HuggingFace repository ID for the model
--build-arg REVISION="gptq-4bit-32g-actorder_True": Revision or commit hash for the model
--build-arg QUANTIZATION="gptq": Quantization type (e.g., gptq, awq, or empty for un-quantized)
--build-arg TENSOR_PARALLEL_SIZE="1": The number of gpus to spread the tensor processing across

Deployment

To build and deploy the vllm backend Zarf package into an existing UDS Kubernetes cluster:

Important

Execute the following commands from the root of the LeapfrogAI repository

pip install 'huggingface_hub[cli,hf_transfer]'  # Used to download the model weights from huggingface
make build-vllm LOCAL_VERSION=dev
uds zarf package deploy packages/vllm/zarf-package-vllm-*-dev.tar.zst --confirm

Local Development

To run the vllm backend locally:

Important

Execute the following commands from this sub-directory

# Install dev and runtime dependencies
make install

# Clone Model
python src/model_download.py

# Start the model backend
make dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm

vllm

README.md

LeapfrogAI vLLM Backend

Usage

Pre-Requisites

Dependent Components

Model Selection

Deployment

Local Development

Files

vllm

Directory actions

More options

Directory actions

More options

Latest commit

History

vllm

Folders and files

parent directory

README.md

LeapfrogAI vLLM Backend

Usage

Pre-Requisites

Dependent Components

Model Selection

Deployment

Local Development