2.2.12 Backend: SGLang

SGLang

Handle: sglang URL: http://localhost:34091

PyPI - Downloads

SGLang is a fast serving framework for large language models and vision language models.

Starting

# [Optional] Pre-pull the image
harbor pull sglang

Configuration

SGLang is similar to vLLM in the models it can run, so the configuration is similar.

# Quickly lookup some of the compatible quants
harbor hf find awq
harbor hf find gptq

# Download with HF CLI
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF

# Set the model to run using HF specifier
harbor sglang model google/gemma-2-2b-it

# To run a gated model, ensure that you've
# also set your Huggingface API Token
harbor hf token <your-token>

You can specify additional args via harbor sglang args:

# See original CLI help for available options
harbor run sglang --help

# Set the extra arguments via "harbor args"
harbor sglang args --context-length 2048 --disable-cuda-graph

Home | CLI Reference | Services | Adding New Service | Compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.2.12 Backend: SGLang

SGLang

Starting

Configuration

Clone this wiki locally