Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single command LLM deployment #3209

Merged
merged 33 commits into from
Jun 28, 2024
Merged

Conversation

mreso
Copy link
Collaborator

@mreso mreso commented Jun 26, 2024

Description

This PR adds a feature to TorchServe to deploy LLM with a single command.
Is uses a new ts.launcher interface to start and stop torchserve and adds ts.llm_launcher to launch an LLM model given its hugginface hub identifier using our new vllm integration.
It also adds a docker image based on our gpu image to easily run this without installing torchserve.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token
curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model"

Logs for Test A

{"text": " [", "tokens": 510}{"text": "Your", "tokens": 7927}{"text": " Name", "tokens": 4076}{"text": "].", "tokens": 948}{"text": " I", "tokens": 358}{"text": " am", "tokens": 1097}{"text": " a", "tokens": 264}{"text": " [", "tokens": 510}{"text": "Your", "tokens": 7927}{"text": " Profession", "tokens": 50311}{"text": "/", "tokens": 14}{"text": "Student", "tokens": 14428}{"text": "]", "tokens": 60}{"text": " and", "tokens": 323}{"text": " I", "tokens": 358}{"text": " am", "tokens": 1097}

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@mreso mreso requested a review from agunapal June 26, 2024 21:38
docker/Dockerfile.llm Show resolved Hide resolved
ts/llm_launcher.py Show resolved Hide resolved
docker/Dockerfile.llm Show resolved Hide resolved
@mreso mreso marked this pull request as ready for review June 27, 2024 17:23
@mreso mreso requested a review from agunapal June 27, 2024 17:24
Copy link
Collaborator

@agunapal agunapal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!
Had one comment on the token. Please check and change if you think it makes sense


You can then go ahead and launch a TorchServe instance serving your selected model:
```bash
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed -e HUGGING_FACE_HUB_TOKEN=$token Why not directly set export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN>

Also from the security POV, does the existing command print the token?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the docker will not pick up env variables from the calling environment. So you would still have

export HUGGING_FACE_HUB_TOKEN= <HUGGINGFACE_HUB_TOKEN>
docker run --rm -ti --gpus all -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN ...

which is even longer and it comes down to the same process. For a RL deployment the token variable would be set through a secret and so the token would not show up in any of the logs.

@mreso mreso added this pull request to the merge queue Jun 28, 2024
Merged via the queue into master with commit 160bee7 Jun 28, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants