Add FlashInfer to default Dockerfile #6172

simon-mo · 2024-07-06T08:16:29Z

Testing with

docker run --gpus all -p 8000:8000 -e HF_TOKEN --ipc=host --env "VLLM_ATTENTION_BACKEND=FLASHINFER" -v /data/xmo/hub:/root/.cache/huggingface vllm/vllm-openai --model google/gemma-2-9b-it

$ curl http://localhost:8000/v1/completions  -H "Content-Type: application/json"      -d '{
"model": "google/gemma-2-9b-it",
"prompt":"Who won the world series in 2020?",
"max_tokens": 100,
"ignore_eos": true
}'
{"id":"cmpl-8ce64ceae52449e2b04988b08f3f42f9","object":"text_completion","created":1720253967,"model":"google/gemma-2-9b-it","choices":[{"index":0,"text":"\n\nThe **Los Angeles Dodgers** won the World Series in 2020. \n\\\\\n  \\\\\n\n\n\\\\\n\\\\\n\\\n\n\\\\\n\\\\\n\n\n\n\n.\n\n\n'.\n\n\n\n\n\n **\n\n\n\n\n。","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":13,"total_tokens":113,"completion_tokens":100}}(miniconda3) (base) [xmo@flow-matic:/data/xmo/vllm]$

The docker image for vllm/vllm-openai:latest and vllm/vllm-openai:v0.5.1 has been built and updated. This doesn't effect the wheel build.

WoosukKwon

LGTM!

zhyncs

FlashInfer 0.9.0 has been optimized for GQA, maybe we can wait until version 0.9.0 is released before integrating it into the Dockerfile.

simon-mo · 2024-07-08T20:38:16Z

We already released v0.8.0 and will be able to update right after!

zhyncs · 2024-07-08T23:26:51Z

ok

Signed-off-by: Alvant <alvasian@yandex.ru>

Add FlashInfer to default Dockerfile

9a8cce5

simon-mo requested review from LiuXiaoxuanPKU and WoosukKwon July 6, 2024 08:25

WoosukKwon approved these changes Jul 6, 2024

View reviewed changes

zhyncs reviewed Jul 7, 2024

View reviewed changes

simon-mo mentioned this pull request Jul 8, 2024

[Bug]: flashinfer not in docker build #6221

Closed

simon-mo merged commit 4f0e0ea into vllm-project:main Jul 8, 2024
70 checks passed

dtrifiro pushed a commit to opendatahub-io/vllm that referenced this pull request Jul 17, 2024

Add FlashInfer to default Dockerfile (vllm-project#6172)

2e09e9a

xjpang pushed a commit to xjpang/vllm that referenced this pull request Jul 24, 2024

Add FlashInfer to default Dockerfile (vllm-project#6172)

1e679b7

Alvant pushed a commit to compressa-ai/vllm that referenced this pull request Oct 26, 2024

Add FlashInfer to default Dockerfile (vllm-project#6172)

b6d40c6

Signed-off-by: Alvant <alvasian@yandex.ru>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FlashInfer to default Dockerfile #6172

Add FlashInfer to default Dockerfile #6172

simon-mo commented Jul 6, 2024 •

edited

Loading

WoosukKwon left a comment

zhyncs left a comment

simon-mo commented Jul 8, 2024

zhyncs commented Jul 8, 2024

Add FlashInfer to default Dockerfile #6172

Add FlashInfer to default Dockerfile #6172

Conversation

simon-mo commented Jul 6, 2024 • edited Loading

WoosukKwon left a comment

Choose a reason for hiding this comment

zhyncs left a comment

Choose a reason for hiding this comment

simon-mo commented Jul 8, 2024

zhyncs commented Jul 8, 2024

simon-mo commented Jul 6, 2024 •

edited

Loading