Feature add openai api for vllm integration #3287

mreso · 2024-08-16T05:05:21Z

Description

This PR enables OpenAI compatible api for vllm integration.
It removes the previous undefined interface for llms served through vllm engine.

New interface can be used like OpenAI endpoint:
Curl:

curl --header "Content-Type: application/json"   --request POST   --data @prompt.json http://localhost:8080/predictions/llama-8b-lora/1.0/v1/completions

Python + Request:

 python ../../utils/test_llm_streaming_response.py -m llama-8b-lora -o 50 -t 2 -n 4 --prompt-text "@prompt.json" --prompt-json --openai-api --demo-streaming

OpenAI client:

from openai import OpenAI
model_name = "llama-8b-lora"
stream=True
openai_api_key = "EMPTY"
openai_api_base = f"http://localhost:8080/predictions/{model_name}/1.0/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

response = client.completions.create(
    model=model_name, prompt="Hello world", temperature=0.0, stream=stream
)
for chunk in reponse:
    print(f"{chunk=}")

Supported and planned APIs:

"/v1/completions"
"/v1/chat/completions"
"/v1/models"

Type of change

Please delete options that are not relevant.

Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

pytest test/pytest/test_example_vllm.py
Logs for Test A:

========================================================================================================================================================================== test session starts ===========================================================================================================================================================================
platform linux -- Python 3.11.9, pytest-7.3.1, pluggy-1.0.0
rootdir: /home/ubuntu/serve
plugins: cov-4.1.0, anyio-4.4.0, mock-3.14.0
collected 6 items

test/pytest/test_example_vllm.py 2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90]
2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90] {%- for message in messages %}
2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90]     {%- if not (message.role == 'ipython' or message.role == 'tool' or 'tool_calls' in message) %}
2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90]         {{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' }}
2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90]     {%- elif 'tool_calls' in message %}
2024-08-17T04:04:33,713 [INFO ] W-29500-Meta-Llama-31-8B_1.0-stdout MODEL_LOG - INFO 08-17 04:04:33 chat_utils.py:90]         {%- if not message.tool_calls|length == 1 %}
.2024-08-17T04:04:39,102 [INFO ] epollEventLoopGroup-5-1 TS_METRICS - ts_queue_latency_microseconds.Microseconds:0.0|#model_name:Meta-Llama-31-8B,model_version:1.0|#hostname:ip-172-31-3-46,timestamp:1723867479
2024-08-17T04:04:39,102 [DEBUG] epollEventLoopGroup-5-1 org.pytorch.serve.job.RestJob - Waiting time ns: 0, Backend time ns: 989658222
2024-08-17T04:04:39,102 [INFO ] epollEventLoopGroup-5-1 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-3-46,timestamp:1723867479
.2024-08-17T04:04:40,098 [INFO ] epollEventLoopGroup-5-1 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-3-46,timestamp:1723867480
.2024-08-17T04:04:43,200 [DEBUG] epollEventLoopGroup-5-1 org.pytorch.serve.job.RestJob - Waiting time ns: 0, Backend time ns: 3093682692
2024-08-17T04:04:43,200 [INFO ] epollEventLoopGroup-5-1 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:ip-172-31-3-46,timestamp:1723867483
...                                                                                                                                                                                                                                                                                                                            [100%]

============================================================================================================================================================================ warnings summary ============================================================================================================================================================================
../miniconda3/envs/serve/lib/python3.11/site-packages/pyairports/airports.py:1
  /home/ubuntu/miniconda3/envs/serve/lib/python3.11/site-packages/pyairports/airports.py:1: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    from pkg_resources import resource_string

../miniconda3/envs/serve/lib/python3.11/site-packages/pkg_resources/__init__.py:2832
  /home/ubuntu/miniconda3/envs/serve/lib/python3.11/site-packages/pkg_resources/__init__.py:2832: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('ruamel')`.
  Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
    declare_namespace(pkg)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================================================================================================================================== 6 passed, 2 warnings in 28.55s =====================================================================================================================================================================

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

…ted lora example to llama 3.1

agunapal

LGTM

nit: Can you please add a comment on how this is being handled in the frontend

frontend/server/src/main/java/org/pytorch/serve/http/api/rest/InferenceRequestHandler.java

mreso added 12 commits August 13, 2024 20:00

Forward additional url segments as url_paths in request header to model

409c5cd

Fix vllm test and clean preproc

999ada2

First attept to enable OpenAI api for models served via vllm

b8ea79f

fix streaming in openai api

cb18899

Add OpenAIServingCompletion usage example

6b921b3

Add lora modules to vllm engine

4f4f74f

Finish openai completion integration; removed req openai client; upda…

c87678b

…ted lora example to llama 3.1

fix lint

61ba7dc

Update mistral + llama3 vllm example

450570f

Remove openai client from url path test

2714abe

Add openai chat api to vllm example

d21304e

Added v1/models endpoint for vllm example

1e51d6d

mreso marked this pull request as ready for review August 17, 2024 04:06

mreso requested review from agunapal and namannandan August 17, 2024 04:06

mreso added 3 commits August 21, 2024 12:43

Merge branch 'master' into feature/add_openai_api

bd6dea6

Remove accidential breakpoint()

66e754d

Merge branch 'master' into feature/add_openai_api

2074948

agunapal approved these changes Aug 23, 2024

View reviewed changes

frontend/server/src/main/java/org/pytorch/serve/http/api/rest/InferenceRequestHandler.java Show resolved Hide resolved

Add comment to new url_path

f7b4184

mreso enabled auto-merge August 23, 2024 22:28

mreso added this pull request to the merge queue Aug 23, 2024

Merged via the queue into master with commit db1a003 Aug 23, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature add openai api for vllm integration #3287

Feature add openai api for vllm integration #3287

mreso commented Aug 16, 2024 •

edited

Loading

agunapal left a comment

Feature add openai api for vllm integration #3287

Feature add openai api for vllm integration #3287

Conversation

mreso commented Aug 16, 2024 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

agunapal left a comment

Choose a reason for hiding this comment

mreso commented Aug 16, 2024 •

edited

Loading