Support vllm openai api server #694

zmvictor · 2024-06-04T05:41:11Z

Per https://docs.vllm.ai/en/latest/serving/metrics.html, openai api server supports vLLM serving metrics by default. This PR therefore:

updates api server from vanilla to openai mode
adds swap_space argument suggested in vLLM benchmarks

e2e tests with model meta-llama/Llama-2-7b-chat-hf. After terraform apply:

# Get vLLM LB's external IP
$ VLLM_EXTERNAL_IP=`kubectl -n benchmark get service vllm -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`

# send a prompt to the endpoint
$ curl $VLLM_EXTERNAL_IP/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-2-7b-chat-hf",
        "prompt": "Seattle City is a",
        "max_tokens": 7,
        "temperature": 0
    }'

# Check prometheus metrics
$ curl $VLLM_EXTERNAL_IP/metrics/

...
# TYPE vllm:prompt_tokens_total counter
vllm:prompt_tokens_total{model_name="meta-llama/Llama-2-7b-chat-hf"} 9.0
...

achandrasekar

Thanks for adding this!

achandrasekar · 2024-06-05T20:44:21Z

Hi @zmvictor just remembered that this breaks the benchmark automation that we have for vLLM where we still are using the /generate API and not the /completions API - https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/benchmarks/benchmark/tools/locust-load-inference/locust-docker/locust-tasks/tasks.py#L172. It would be good to address that too.

Support vllm openai api server

54abe2e

zmvictor requested review from achandrasekar, ahg-g and annapendleton as code owners June 4, 2024 05:41

achandrasekar approved these changes Jun 4, 2024

View reviewed changes

annapendleton approved these changes Jun 5, 2024

View reviewed changes

make terraform link happy

1803c04

zmvictor merged commit f3e12b3 into GoogleCloudPlatform:main Jun 5, 2024
5 checks passed

Edwinhr716 mentioned this pull request Aug 14, 2024

Error: "POST /generate HTTP/1.1" 404 Not Found when running Locust tool against vLLM model server #777

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support vllm openai api server #694

Support vllm openai api server #694

zmvictor commented Jun 4, 2024 •

edited

Loading

achandrasekar left a comment

achandrasekar commented Jun 5, 2024

Support vllm openai api server #694

Support vllm openai api server #694

Conversation

zmvictor commented Jun 4, 2024 • edited Loading

achandrasekar left a comment

Choose a reason for hiding this comment

achandrasekar commented Jun 5, 2024

zmvictor commented Jun 4, 2024 •

edited

Loading