Add single command LLM deployment #3209

mreso · 2024-06-26T21:36:53Z

Description

This PR adds a feature to TorchServe to deploy LLM with a single command.
Is uses a new ts.launcher interface to start and stop torchserve and adds ts.llm_launcher to launch an LLM model given its hugginface hub identifier using our new vllm integration.
It also adds a docker image based on our gpu image to easily run this without installing torchserve.

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A

docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token
curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

Logs for Test A

{"text": " [", "tokens": 510}{"text": "Your", "tokens": 7927}{"text": " Name", "tokens": 4076}{"text": "].", "tokens": 948}{"text": " I", "tokens": 358}{"text": " am", "tokens": 1097}{"text": " a", "tokens": 264}{"text": " [", "tokens": 510}{"text": "Your", "tokens": 7927}{"text": " Profession", "tokens": 50311}{"text": "/", "tokens": 14}{"text": "Student", "tokens": 14428}{"text": "]", "tokens": 60}{"text": " and", "tokens": 323}{"text": " I", "tokens": 358}{"text": " am", "tokens": 1097}

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

docker/Dockerfile.llm

ts/llm_launcher.py

frontend/server/src/main/java/org/pytorch/serve/wlm/AsyncWorkerThread.java

docker/Dockerfile.llm

…llm_deployment

agunapal

Looks good!
Had one comment on the token. Please check and change if you think it makes sense

agunapal · 2024-06-28T00:12:19Z

docs/llm_deployment.md

+
+You can then go ahead and launch a TorchServe instance serving your selected model:
+```bash
+docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token


Is this needed -e HUGGING_FACE_HUB_TOKEN=$token Why not directly set export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN>

Also from the security POV, does the existing command print the token?

AFAIK the docker will not pick up env variables from the calling environment. So you would still have

export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN> docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN ...

which is even longer and it comes down to the same process. For a RL deployment the token variable would be set through a secret and so the token would not show up in any of the logs.

mreso added 20 commits June 26, 2024 18:59

move start_torchserve from test_utils into ts.launcher

c4f968a

Move register model into launcher

08377fe

Readd imports to register_model in test_util

fbb1f5d

Move vllm_handler into ts/torch_handler and add vllm to dependencies

449603d

Register vllm_handler in model_archiver

bdb3f80

Remove gen_mars from launcher

5b8801d

Add llm_launcher script + llm docker

6781012

Use model_path as mode id if path does not exist

0da0b47

Add arguments to llm_launcher

202d137

Wait for load command to finish

4161c9d

Optionally skip waiting in launcher.stop

18016d5

remove custom loading of model archiver

8ab85b8

Move llm_launcher to ts

5bc4914

Set model load timeout to 10 min

0009390

Finalize dockerfile.llm

c5476fc

Adjust default value of ts launcher for token auth and model api

e9de819

updated llm_launcher.py

a69c79c

Add llm deployment to readme.md

5eb640f

Added documentation for llm launcher

3122616

Added section on supported models

2003bd0

mreso requested a review from agunapal June 26, 2024 21:38

agunapal reviewed Jun 26, 2024

View reviewed changes

docker/Dockerfile.llm Show resolved Hide resolved

ts/llm_launcher.py Show resolved Hide resolved

frontend/server/src/main/java/org/pytorch/serve/wlm/AsyncWorkerThread.java Show resolved Hide resolved

docker/Dockerfile.llm Show resolved Hide resolved

mreso added 8 commits June 26, 2024 22:44

Enable tensor parallelism in llm launcher

de572a0

Add reference to go beyond quickstart

332fb43

fix spellcheck lint

f3508dd

HPC->HPU

61b8820

doc

286e034

Move margen import below path changes

65b6480

Merge remote-tracking branch 'origin/master' into feature/single_cmd_…

3b1f27c

…llm_deployment

Fix java formatting

c7fdbf4

mreso added 3 commits June 27, 2024 01:02

Remove gen_mar kw

8155398

Fix error if model_store is used as positional argument

474494a

Remove .queue

57de73e

mreso marked this pull request as ready for review June 27, 2024 17:23

Merge branch 'master' into feature/single_cmd_llm_deployment

a12dd8b

mreso requested a review from agunapal June 27, 2024 17:24

Merge branch 'master' into feature/single_cmd_llm_deployment

b49d8d8

agunapal approved these changes Jun 28, 2024

View reviewed changes

mreso added this pull request to the merge queue Jun 28, 2024

Merged via the queue into master with commit 160bee7 Jun 28, 2024
12 checks passed

mreso mentioned this pull request Jun 28, 2024

Security documentation update #3183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add single command LLM deployment #3209

Add single command LLM deployment #3209

mreso commented Jun 26, 2024 •

edited

Loading

agunapal left a comment

agunapal Jun 28, 2024

mreso Jun 28, 2024

Add single command LLM deployment #3209

Add single command LLM deployment #3209

Conversation

mreso commented Jun 26, 2024 • edited Loading

Description

Type of change

Feature/Issue validation/testing

Checklist:

agunapal left a comment

Choose a reason for hiding this comment

agunapal Jun 28, 2024

Choose a reason for hiding this comment

mreso Jun 28, 2024

Choose a reason for hiding this comment

mreso commented Jun 26, 2024 •

edited

Loading