InternLM · lvhan028 · Sep 23, 2024 · Sep 20, 2024 · Sep 20, 2024 · Sep 20, 2024
diff --git a/docs/en/llm/proxy_server.md b/docs/en/llm/proxy_server.md
@@ -7,12 +7,19 @@ The request distributor service can parallelize multiple api_server services. Us
 Start the proxy service:
 
 ```shell
-python3 -m lmdeploy.serve.proxy.proxy --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency"
+lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
 ```
 
 After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI.
+Subsequently, users can add it directly to the proxy service when starting the `api_server` service by using the `--proxy-url` command. For example:
+`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。
+In this way, users can access the services of the `api_server` through the proxy node, and the usage of the proxy node is exactly the same as that of the `api_server`, both of which are compatible with the OpenAI format.
 
-## API
+- /v1/models
+- /v1/chat/completions
+- /v1/completions
+
+## Node Management
 
 Through Swagger UI, we can see multiple APIs. Those related to api_server node management include:
 
@@ -22,13 +29,64 @@ Through Swagger UI, we can see multiple APIs. Those related to api_server node m
 
 They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node.
 
-APIs related to usage include:
+### Node Management through curl
 
-- /v1/models
-- /v1/chat/completions
-- /v1/completions
+```shell
+curl -X 'GET' \
+  'http://localhost:8000/nodes/status' \
+  -H 'accept: application/json'
+```
 
-The usage of these APIs is the same as that of api_server.
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/add' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "url": "http://0.0.0.0:23333"
+}'
+```
+
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
+  -H 'accept: application/json' \
+  -d ''
+```
+
+### Node Management through python
+
+```python
+# query all nodes
+import requests
+url = 'http://localhost:8000/nodes/status'
+headers = {'accept': 'application/json'}
+response = requests.get(url, headers=headers)
+print(response.text)
+```
+
+```python
+# add a new node
+import requests
+url = 'http://localhost:8000/nodes/add'
+headers = {
+    'accept': 'application/json',
+    'Content-Type': 'application/json'
+}
+data = {"url": "http://0.0.0.0:23333"}
+response = requests.post(url, headers=headers, json=data)
+print(response.text)
+```
+
+```python
+# delete a node
+import requests
+url = 'http://localhost:8000/nodes/remove'
+headers = {'accept': 'application/json',}
+params = {'node_url': 'http://0.0.0.0:23333',}
+response = requests.post(url, headers=headers, data='', params=params)
+print(response.text)
+```
 
 ## Dispatch Strategy
 

diff --git a/docs/zh_cn/llm/proxy_server.md b/docs/zh_cn/llm/proxy_server.md
@@ -7,28 +7,85 @@
 启动代理服务：
 
 ```shell
-python3 -m lmdeploy.serve.proxy.proxy --server_name {server_name} --server_port {server_port} --strategy "min_expected_latency"
+lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"
 ```
 
 启动成功后，代理服务的 URL 也会被脚本打印。浏览器访问这个 URL，可以打开 Swagger UI。
+随后，用户可以在启动 api_server 服务的时候，通过 `--proxy-url` 命令将其直接添加到代理服务中。例如：`lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000`。
+这样，用户可以通过代理节点访问 api_server 的服务，代理节点的使用方式和 api_server 一模一样，都是兼容 OpenAI 的形式。
 
-## API
+- /v1/models
+- /v1/chat/completions
+- /v1/completions
+
+## 节点管理
 
 通过 Swagger UI，我们可以看到多个 API。其中，和 api_server 节点管理相关的有：
 
 - /nodes/status
 - /nodes/add
 - /nodes/remove
 
-他们分别表示，查看所有的 api_server 服务节点，增加某个节点，删除某个节点。
+他们分别表示，查看所有的 api_server 服务节点，增加某个节点，删除某个节点。他们的使用方式，最直接的可以在浏览器里面直接操作。也可以通过命令行或者 python 操作。
 
-和使用相关的 api 有：
+### 通过 command 增删查
 
-- /v1/models
-- /v1/chat/completions
-- /v1/completions
+```shell
+curl -X 'GET' \
+  'http://localhost:8000/nodes/status' \
+  -H 'accept: application/json'
+```
 
-这些 API 的使用方式和 api_server 一样。
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/add' \
+  -H 'accept: application/json' \
+  -H 'Content-Type: application/json' \
+  -d '{
+  "url": "http://0.0.0.0:23333"
+}'
+```
+
+```shell
+curl -X 'POST' \
+  'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
+  -H 'accept: application/json' \
+  -d ''
+```
+
+### 通过 python 脚本增删查
+
+```python
+# 查询所有节点
+import requests
+url = 'http://localhost:8000/nodes/status'
+headers = {'accept': 'application/json'}
+response = requests.get(url, headers=headers)
+print(response.text)
+```
+
+```python
+# 添加新节点
+import requests
+url = 'http://localhost:8000/nodes/add'
+headers = {
+    'accept': 'application/json',
+    'Content-Type': 'application/json'
+}
+data = {"url": "http://0.0.0.0:23333"}
+response = requests.post(url, headers=headers, json=data)
+print(response.text)
+```
+
+```python
+# 删除某个节点
+import requests
+url = 'http://localhost:8000/nodes/remove'
+headers = {'accept': 'application/json',}
+params = {'node_url': 'http://0.0.0.0:23333',}
+response = requests.post(url, headers=headers, data='', params=params)
+print(response.text)
+```
 
 ## 分发策略
 

diff --git a/lmdeploy/cli/serve.py b/lmdeploy/cli/serve.py
@@ -134,6 +134,10 @@ def add_parser_api_server():
                             type=str,
                             default=['*'],
                             help='A list of allowed http headers for cors')
+        parser.add_argument('--proxy-url',
+                            type=str,
+                            default=None,
+                            help='The proxy url for api server.')
         # common args
         ArgumentHelper.backend(parser)
         ArgumentHelper.log_level(parser)
@@ -204,6 +208,36 @@ def add_parser_api_client():
                             'api key will be used')
         ArgumentHelper.session_id(parser)
 
+    @staticmethod
+    def add_parser_proxy():
+        """Add parser for proxy server command."""
+        parser = SubCliServe.subparsers.add_parser(
+            'proxy',
+            formatter_class=DefaultsAndTypesHelpFormatter,
+            description=SubCliServe.proxy.__doc__,
+            help=SubCliServe.proxy.__doc__)
+        parser.set_defaults(run=SubCliServe.proxy)
+        parser.add_argument('--server-name',
+                            type=str,
+                            default='0.0.0.0',
+                            help='Host ip for proxy serving')
+        parser.add_argument('--server-port',
+                            type=int,
+                            default=8000,
+                            help='Server port of the proxy')
+        parser.add_argument(
+            '--strategy',
+            type=str,
+            choices=['random', 'min_expected_latency', 'min_observed_latency'],
+            default='min_expected_latency',
+            help='the strategy to dispatch requests to nodes')
+        parser.add_argument('--api-key',
+                            type=str,
+                            default=None,
+                            help='api key. Default to None, which means no '
+                            'api key will be used')
+        ArgumentHelper.ssl(parser)
+
     @staticmethod
     def gradio(args):
         """Serve LLMs with web UI using gradio."""
@@ -311,6 +345,7 @@ def api_server(args):
                        log_level=args.log_level.upper(),
                        api_keys=args.api_keys,
                        ssl=args.ssl,
+                       proxy_url=args.proxy_url,
                        max_log_len=args.max_log_len)
 
     @staticmethod
@@ -320,8 +355,16 @@ def api_client(args):
         kwargs = convert_args(args)
         run_api_client(**kwargs)
 
+    @staticmethod
+    def proxy(args):
+        """Proxy server that manages distributed api_server nodes."""
+        from lmdeploy.serve.proxy.proxy import proxy
+        kwargs = convert_args(args)
+        proxy(**kwargs)
+
     @staticmethod
     def add_parsers():
         SubCliServe.add_parser_gradio()
         SubCliServe.add_parser_api_server()
         SubCliServe.add_parser_api_client()
+        SubCliServe.add_parser_proxy()
diff --git a/lmdeploy/serve/openai/api_server.py b/lmdeploy/serve/openai/api_server.py
@@ -40,6 +40,9 @@ class VariableInterface:
     session_id: int = 0
     api_keys: Optional[List[str]] = None
     request_hosts = []
+    # following are for registering to proxy server
+    proxy_url: Optional[str] = None
+    api_server_url: Optional[str] = None
 
 
 app = FastAPI(docs_url='/')
@@ -926,6 +929,33 @@ async def stream_results() -> AsyncGenerator[bytes, None]:
         return JSONResponse(ret)
 
 
+@app.on_event('startup')
+async def startup_event():
+    if VariableInterface.proxy_url is None:
+        return
+    try:
+        import requests
+        url = f'{VariableInterface.proxy_url}/nodes/add'
+        data = {
+            'url': VariableInterface.api_server_url,
+            'status': {
+                'models': get_model_list()
+            }
+        }
+        headers = {
+            'accept': 'application/json',
+            'Content-Type': 'application/json'
+        }
+        response = requests.post(url, headers=headers, json=data)
+
+        if response.status_code != 200:
+            raise HTTPException(status_code=400,
+                                detail='Service registration failed')
+        print(response.text)
+    except Exception as e:
+        print(f'Service registration failed: {e}')
+
+
 def serve(model_path: str,
           model_name: Optional[str] = None,
           backend: Literal['turbomind', 'pytorch'] = 'turbomind',
@@ -941,6 +971,7 @@ def serve(model_path: str,
           log_level: str = 'ERROR',
           api_keys: Optional[Union[List[str], str]] = None,
           ssl: bool = False,
+          proxy_url: Optional[str] = None,
           max_log_len: int = None,
           **kwargs):
     """An example to perform model inference through the command line
@@ -983,6 +1014,7 @@ def serve(model_path: str,
             api key applied.
         ssl (bool): Enable SSL. Requires OS Environment variables
             'SSL_KEYFILE' and 'SSL_CERTFILE'.
+        proxy_url (str): The proxy url to register the api_server.
         max_log_len (int): Max number of prompt characters or prompt tokens
             being printed in log. Default: Unlimited
     """
@@ -1019,6 +1051,9 @@ def serve(model_path: str,
         max_log_len=max_log_len,
         **kwargs)
 
+    if proxy_url is not None:
+        VariableInterface.proxy_url = proxy_url
+        VariableInterface.api_server_url = f'{http_or_https}://{server_name}:{server_port}'  # noqa
     for i in range(3):
         print(
             f'HINT:    Please open \033[93m\033[1m{http_or_https}://'