Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve proxy server usage #2488

Merged
merged 6 commits into from
Sep 23, 2024
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 65 additions & 7 deletions docs/en/llm/proxy_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,19 @@ The request distributor service can parallelize multiple api_server services. Us
Start the proxy service:

```shell
python3 -m lmdeploy.serve.proxy.proxy --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency"
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
```

After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI.
Subsequently, users can add it directly to the proxy service when starting the `api_server` service by using the `--proxy-url` command. For example:
`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。
In this way, users can access the services of the `api_server` through the proxy node, and the usage of the proxy node is exactly the same as that of the `api_server`, both of which are compatible with the OpenAI format.

## API
- /v1/models
- /v1/chat/completions
- /v1/completions

## Node Management

Through Swagger UI, we can see multiple APIs. Those related to api_server node management include:

Expand All @@ -22,13 +29,64 @@ Through Swagger UI, we can see multiple APIs. Those related to api_server node m

They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node.

APIs related to usage include:
### Node Management through curl

- /v1/models
- /v1/chat/completions
- /v1/completions
```shell
curl -X 'GET' \
'http://localhost:8000/nodes/status' \
-H 'accept: application/json'
```

The usage of these APIs is the same as that of api_server.
```shell
curl -X 'POST' \
'http://localhost:8000/nodes/add' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"url": "http://0.0.0.0:23333"
}'
```

```shell
curl -X 'POST' \
'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
-H 'accept: application/json' \
-d ''
```

### Node Management through python

```python
# query all nodes
import requests
url = 'http://localhost:8000/nodes/status'
headers = {'accept': 'application/json'}
response = requests.get(url, headers=headers)
print(response.text)
```

```python
# add a new node
import requests
url = 'http://localhost:8000/nodes/add'
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = {"url": "http://0.0.0.0:23333"}
response = requests.post(url, headers=headers, json=data)
print(response.text)
```

```python
# delete a node
import requests
url = 'http://localhost:8000/nodes/remove'
headers = {'accept': 'application/json',}
params = {'node_url': 'http://0.0.0.0:23333',}
response = requests.post(url, headers=headers, data='', params=params)
print(response.text)
```

## Dispatch Strategy

Expand Down
73 changes: 65 additions & 8 deletions docs/zh_cn/llm/proxy_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,28 +7,85 @@
启动代理服务:

```shell
python3 -m lmdeploy.serve.proxy.proxy --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency"
lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
```

启动成功后,代理服务的 URL 也会被脚本打印。浏览器访问这个 URL,可以打开 Swagger UI。
随后,用户可以在启动 api_server 服务的时候,通过 `--proxy-url` 命令将其直接添加到代理服务中。例如:`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。
这样,用户可以通过代理节点访问 api_server 的服务,代理节点的使用方式和 api_server 一模一样,都是兼容 OpenAI 的形式。

## API
- /v1/models
- /v1/chat/completions
- /v1/completions

## 节点管理

通过 Swagger UI,我们可以看到多个 API。其中,和 api_server 节点管理相关的有:

- /nodes/status
- /nodes/add
- /nodes/remove

他们分别表示,查看所有的 api_server 服务节点,增加某个节点,删除某个节点。
他们分别表示,查看所有的 api_server 服务节点,增加某个节点,删除某个节点。他们的使用方式,最直接的可以在浏览器里面直接操作。也可以通过命令行或者 python 操作。

和使用相关的 api 有:
### 通过 command 增删查

- /v1/models
- /v1/chat/completions
- /v1/completions
```shell
curl -X 'GET' \
'http://localhost:8000/nodes/status' \
-H 'accept: application/json'
```

这些 API 的使用方式和 api_server 一样。
```shell
curl -X 'POST' \
'http://localhost:8000/nodes/add' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"url": "http://0.0.0.0:23333"
}'
```

```shell
curl -X 'POST' \
'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
-H 'accept: application/json' \
-d ''
```

### 通过 python 脚本增删查

```python
# 查询所有节点
import requests
url = 'http://localhost:8000/nodes/status'
headers = {'accept': 'application/json'}
response = requests.get(url, headers=headers)
print(response.text)
```

```python
# 添加新节点
import requests
url = 'http://localhost:8000/nodes/add'
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
data = {"url": "http://0.0.0.0:23333"}
response = requests.post(url, headers=headers, json=data)
print(response.text)
```

```python
# 删除某个节点
import requests
url = 'http://localhost:8000/nodes/remove'
headers = {'accept': 'application/json',}
params = {'node_url': 'http://0.0.0.0:23333',}
response = requests.post(url, headers=headers, data='', params=params)
print(response.text)
```

## 分发策略

Expand Down
43 changes: 43 additions & 0 deletions lmdeploy/cli/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ def add_parser_api_server():
type=str,
default=['*'],
help='A list of allowed http headers for cors')
parser.add_argument('--proxy-url',
type=str,
default=None,
help='The proxy url for api server.')
# common args
ArgumentHelper.backend(parser)
ArgumentHelper.log_level(parser)
Expand Down Expand Up @@ -204,6 +208,36 @@ def add_parser_api_client():
'api key will be used')
ArgumentHelper.session_id(parser)

@staticmethod
def add_parser_proxy():
"""Add parser for proxy server command."""
parser = SubCliServe.subparsers.add_parser(
'proxy',
formatter_class=DefaultsAndTypesHelpFormatter,
description=SubCliServe.proxy.__doc__,
help=SubCliServe.proxy.__doc__)
parser.set_defaults(run=SubCliServe.proxy)
parser.add_argument('--server-name',
type=str,
default='0.0.0.0',
help='Host ip for proxy serving')
parser.add_argument('--server-port',
type=int,
default=8000,
help='Server port of the proxy')
parser.add_argument(
'--strategy',
type=str,
choices=['random', 'min_expected_latency', 'min_observed_latency'],
default='min_expected_latency',
help='the strategy to dispatch requests to nodes')
parser.add_argument('--api-key',
AllentDan marked this conversation as resolved.
Show resolved Hide resolved
type=str,
default=None,
help='api key. Default to None, which means no '
'api key will be used')
ArgumentHelper.ssl(parser)

@staticmethod
def gradio(args):
"""Serve LLMs with web UI using gradio."""
Expand Down Expand Up @@ -311,6 +345,7 @@ def api_server(args):
log_level=args.log_level.upper(),
api_keys=args.api_keys,
ssl=args.ssl,
proxy_url=args.proxy_url,
max_log_len=args.max_log_len)

@staticmethod
Expand All @@ -320,8 +355,16 @@ def api_client(args):
kwargs = convert_args(args)
run_api_client(**kwargs)

@staticmethod
def proxy(args):
"""Proxy server that manages distributed api_server nodes."""
from lmdeploy.serve.proxy.proxy import proxy
kwargs = convert_args(args)
proxy(**kwargs)

@staticmethod
def add_parsers():
SubCliServe.add_parser_gradio()
SubCliServe.add_parser_api_server()
SubCliServe.add_parser_api_client()
SubCliServe.add_parser_proxy()
35 changes: 35 additions & 0 deletions lmdeploy/serve/openai/api_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ class VariableInterface:
session_id: int = 0
api_keys: Optional[List[str]] = None
request_hosts = []
# following are for registering to proxy server
proxy_url: Optional[str] = None
api_server_url: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary? Can we use http://{server_ip}:{server_port} instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a global variable. VariableInterface.api_server_url = f'{http_or_https}://{server_name}:{server_port}'



app = FastAPI(docs_url='/')
Expand Down Expand Up @@ -926,6 +929,33 @@ async def stream_results() -> AsyncGenerator[bytes, None]:
return JSONResponse(ret)


@app.on_event('startup')
async def startup_event():
if VariableInterface.proxy_url is None:
return
try:
import requests
url = f'{VariableInterface.proxy_url}/nodes/add'
data = {
'url': VariableInterface.api_server_url,
'status': {
'models': get_model_list()
}
}
headers = {
'accept': 'application/json',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=data)

if response.status_code != 200:
raise HTTPException(status_code=400,
detail='Service registration failed')
print(response.text)
except Exception as e:
print(f'Service registration failed: {e}')


def serve(model_path: str,
model_name: Optional[str] = None,
backend: Literal['turbomind', 'pytorch'] = 'turbomind',
Expand All @@ -941,6 +971,7 @@ def serve(model_path: str,
log_level: str = 'ERROR',
api_keys: Optional[Union[List[str], str]] = None,
ssl: bool = False,
proxy_url: Optional[str] = None,
max_log_len: int = None,
**kwargs):
"""An example to perform model inference through the command line
Expand Down Expand Up @@ -983,6 +1014,7 @@ def serve(model_path: str,
api key applied.
ssl (bool): Enable SSL. Requires OS Environment variables
'SSL_KEYFILE' and 'SSL_CERTFILE'.
proxy_url (str): The proxy url to register the api_server.
max_log_len (int): Max number of prompt characters or prompt tokens
being printed in log. Default: Unlimited
"""
Expand Down Expand Up @@ -1019,6 +1051,9 @@ def serve(model_path: str,
max_log_len=max_log_len,
**kwargs)

if proxy_url is not None:
VariableInterface.proxy_url = proxy_url
VariableInterface.api_server_url = f'{http_or_https}://{server_name}:{server_port}' # noqa
for i in range(3):
print(
f'HINT: Please open \033[93m\033[1m{http_or_https}://'
Expand Down
Loading
Loading