Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rust Router Python Binding #1891

Merged

Conversation

austin362667
Copy link
Contributor

@austin362667 austin362667 commented Nov 2, 2024

Motivation

Add Python binding support for sglang-router. So not only can we run the router with Rust binary but also with Python package.

Modifications

To packaging with build tool PyO3/maturin.

  1. Use maturin develop to build python package and install in Python env.
  2. Add pyproject.toml as PyO3 build configs.

Test

Setup *2 L4 GPU and Load HuggingFaceTB/SmolLM2-135M

# Launch first worker on GPU 0
export CUDA_VISIBLE_DEVICES=0
python -m sglang.launch_server \
    --model-path HuggingFaceTB/SmolLM2-135M \
    --host 127.0.0.1 \
    --port 30000

# Launch second worker on GPU 1
export CUDA_VISIBLE_DEVICES=1
python -m sglang.launch_server \
    --model-path HuggingFaceTB/SmolLM2-135M \
    --host 127.0.0.1 \
    --port 30002

Start the Router via Python Binding

import sglang_router as router

# Create a Router instance with:
# - host: the address to bind to (e.g., "127.0.0.1")
# - port: the port number (e.g., 3001)
# - worker_urls: list of worker URLs to distribute requests to
router = router.Router(
    host="127.0.0.1",
    port=3001,
    worker_urls=[
        "http://localhost:30000",
        "http://localhost:30002",
    ],
    policy="random"
)

# Start the router - this will block and run the server
router.start()

Benchmarking -- Connect to the Router

python -m sglang.bench_serving --backend sglang --host 127.0.0.1 --port 3001
  1. Rust Debug Build
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Successful requests:                     49
Benchmark duration (s):                  3.96
Total input tokens:                      12110
Total generated tokens:                  10321
Total generated tokens (retokenized):    10316
Request throughput (req/s):              12.39
Input token throughput (tok/s):          3061.75
Output token throughput (tok/s):         2609.44
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1438.58
Median E2E Latency (ms):                 1200.87
---------------Time to First Token----------------
Mean TTFT (ms):                          605.67
Median TTFT (ms):                        623.32
P99 TTFT (ms):                           680.36
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          6.14
Median TPOT (ms):                        4.18
P99 TPOT (ms):                           26.07
---------------Inter-token Latency----------------
Mean ITL (ms):                           3.97
Median ITL (ms):                         3.77
P99 ITL (ms):                            5.32
==================================================
  1. Rust Release Build
============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    inf
Successful requests:                     49
Benchmark duration (s):                  3.88
Total input tokens:                      12096
Total generated tokens:                  10043
Total generated tokens (retokenized):    10037
Request throughput (req/s):              12.64
Input token throughput (tok/s):          3121.06
Output token throughput (tok/s):         2591.33
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   1398.56
Median E2E Latency (ms):                 1189.50
---------------Time to First Token----------------
Mean TTFT (ms):                          604.56
Median TTFT (ms):                        611.51
P99 TTFT (ms):                           655.21
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          5.75
Median TPOT (ms):                        3.99
P99 TPOT (ms):                           25.15
---------------Inter-token Latency----------------
Mean ITL (ms):                           3.89
Median ITL (ms):                         3.70
P99 ITL (ms):                            5.11
==================================================

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@austin362667 austin362667 force-pushed the feat/rust_router/add_py_binding branch from 8dc15ec to 1fc3b26 Compare November 5, 2024 11:19
austin362667 and others added 2 commits November 6, 2024 21:28
Signed-off-by: Austin Liu <austin362667@gmail.com>

Clean up

Signed-off-by: Austin Liu <austin362667@gmail.com>
@ByronHsu ByronHsu force-pushed the feat/rust_router/add_py_binding branch from 809e06c to 0d00553 Compare November 6, 2024 22:02
@ByronHsu ByronHsu enabled auto-merge (squash) November 6, 2024 22:22
@ByronHsu ByronHsu disabled auto-merge November 6, 2024 22:22
@Ying1123 Ying1123 merged commit 4b1d7a2 into sgl-project:main Nov 7, 2024
2 checks passed
HaiShaw added a commit to HaiShaw/sglang that referenced this pull request Nov 7, 2024
HaiShaw added a commit to HaiShaw/sglang that referenced this pull request Nov 7, 2024
HaiShaw added a commit to HaiShaw/sglang that referenced this pull request Nov 7, 2024
zhaochenyang20 pushed a commit that referenced this pull request Nov 7, 2024
Signed-off-by: Austin Liu <austin362667@gmail.com>
Co-authored-by: ByronHsu <byronhsu1230@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants