OpenAI Server with VLLM Backend

This project provides a multi-process client and server setup to interact with OpenAI's API, including support for the VLLM backend. The setup includes data parallelism support for VLLM, which significantly improves generation speed.

Installation

Prerequisites

Python 3.7+
pip

Install Dependencies

OpenAI API (Old Version):
```
pip install openai==0.28
```
VLLM:
```
pip install vllm==0.5.3
```

Usage

Client

Gradio Client: Use the gradio_client.py script for a simple interface.
Ray Client: Use the openai_client.py script for multi-process support.

Server

FastAPI Server: Use the fast_api.py script for an HTTP server.
VLLM Server: Use the openai_server.py script for the VLLM backend.

Data Parallelism with VLLM

VLLM supports tensor parallelism and pipeline parallelism, but this project adds support for data parallelism. You can split data into multiple chunks and let multiple VLLM engines generate results simultaneously.

Example

bash eval.sh

Performance

The table below shows the performance improvements with different configurations:

Configuration	Time
VLLM==0.5.3, data=4666
dp=8 + enable_chunked_prefill	1:17
tp=1	8:24
tp=8	6:03
pp=1 (Pipeline parallelism not supported)	N/A
tp=8 + enable_prefix_caching	Segfault
tp=8 + enable_chunked_prefill=True	5:28
tp=8 + enable_chunked_prefill + max_num_seqs 1024 + max_num_batched_tokens 1024	5:43
tp=8 + enable_chunked_prefill + max_num_seqs 4096 + max_num_batched_tokens 4096	6:27
max_num_batched_tokens (512) must be >= max_num_seqs (4096)	Error

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
avoid_kill.sh		avoid_kill.sh
convert_kl16.py		convert_kl16.py
debug_hook.py		debug_hook.py
fast_api.py		fast_api.py
gpu_avoid_kill.py		gpu_avoid_kill.py
gradio_client.py		gradio_client.py
kl-16.yaml		kl-16.yaml
logger_config.py		logger_config.py
openai_client.py		openai_client.py
openai_server.py		openai_server.py
pdb_extension.py		pdb_extension.py
readme.md		readme.md
vllm_dp.py		vllm_dp.py
vllm_dp.sh		vllm_dp.sh
vllm_wrap.py		vllm_wrap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Server with VLLM Backend

Table of Contents

Installation

Prerequisites

Install Dependencies

Usage

Client

Server

Data Parallelism with VLLM

Example

Performance

About

Releases

Packages

Languages

LZY-the-boys/Boys-ToolKit

Folders and files

Latest commit

History

Repository files navigation

OpenAI Server with VLLM Backend

Table of Contents

Installation

Prerequisites

Install Dependencies

Usage

Client

Server

Data Parallelism with VLLM

Example

Performance

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages