-
Notifications
You must be signed in to change notification settings - Fork 44
2.2.12 Backend: SGLang
av edited this page Sep 14, 2024
·
1 revision
Handle:
sglang
URL: http://localhost:34091
SGLang is a fast serving framework for large language models and vision language models.
# [Optional] Pre-pull the image
harbor pull sglang
SGLang is similar to vLLM in the models it can run, so the configuration is similar.
# Quickly lookup some of the compatible quants
harbor hf find awq
harbor hf find gptq
# Download with HF CLI
harbor hf download bartowski/Meta-Llama-3.1-70B-Instruct-GGUF
# Set the model to run using HF specifier
harbor sglang model google/gemma-2-2b-it
# To run a gated model, ensure that you've
# also set your Huggingface API Token
harbor hf token <your-token>
You can specify additional args via harbor sglang args
:
# See original CLI help for available options
harbor run sglang --help
# Set the extra arguments via "harbor args"
harbor sglang args --context-length 2048 --disable-cuda-graph