Skip to content

Commit

Permalink
Add instruction for SkyPilot (#1)
Browse files Browse the repository at this point in the history
* Add instruction for SkyPilot

* rename yaml

* Update README.md

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update README.md

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* Update README.md

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>

* update

* update

* update

---------

Co-authored-by: Zongheng Yang <zongheng.y@gmail.com>
  • Loading branch information
Michaelvll and concretevitamin committed Aug 18, 2024
1 parent fa13b95 commit 4c54e9c
Showing 1 changed file with 39 additions and 0 deletions.
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,45 @@ docker run --gpus all \
1. Copy the [compose.yml](./docker/compose.yaml) to your local machine
2. Execute the command `docker compose up -d` in your terminal.

### Method 5: Run on Kubernetes or Clouds with SkyPilot

To deploy on Kubernetes or clouds, you can use [SkyPilot](https://github.com/skypilot-org/skypilot).

1. Install SkyPilot and set up Kubernetes cluster or cloud access: see [SkyPilot's documentation](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html).
2. Deploy on your own infra with a single command and get the HTTP API endpoint:
<details>
<summary>SkyPilot YAML: <code>sglang.yaml</code></summary>

```yaml
# sglang.yaml
envs:
HF_TOKEN: null

resources:
image_id: docker:lmsysorg/sglang:latest
accelerators: A100
ports: 30000

run: |
conda deactivate
python3 -m sglang.launch_server \
--model-path meta-llama/Meta-Llama-3.1-8B-Instruct \
--host 0.0.0.0 \
--port 30000
```
</details>
```bash
# Deploy on any cloud or Kubernetes cluster. Use --cloud <cloud> to select a specific cloud provider.
HF_TOKEN=<secret> sky launch -c sglang --env HF_TOKEN sglang.yaml

# Get the HTTP API endpoint
sky status --endpoint 30000 sglang
```



### Common Notes
- [FlashInfer](https://github.com/flashinfer-ai/flashinfer) is currently one of the dependencies that must be installed for SGLang. If you are using NVIDIA GPU devices below sm80, such as T4, you can't use SGLang for the time being. We expect to resolve this issue soon, so please stay tuned. If you encounter any FlashInfer-related issues on sm80+ devices (e.g., A100, L40S, H100), consider using Triton's kernel by `--disable-flashinfer --disable-flashinfer-sampling` and raise a issue.
- If you only need to use the OpenAI backend, you can avoid installing other dependencies by using `pip install "sglang[openai]"`.
Expand Down

0 comments on commit 4c54e9c

Please sign in to comment.