-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TorchServe quickstart chatbot example #3003
Conversation
@@ -9,6 +9,33 @@ We are using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) in | |||
You can run this example on your laptop to understand how to use TorchServe | |||
|
|||
|
|||
## Quick Start Guide |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be more ambitious and make this our new getting started
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal is to do a 3 part solution
- chatbot quickstart with streamlit -> Because chatbots are popular
- TS multi model app to show TS' full capability - Use this to create video series
- quick start script for common use-cases with curl command -> This can be the getting started guide.
# 2: Build TorchServe Image for Serving llama2-7b model with 4-bit quantization | ||
./examples/llm/llama2/chat_app/docker/build_image.sh meta-llama/Llama-2-7b-chat-hf | ||
|
||
# 3: Launch the streamlit app for server & client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not exactly what you might have in mind but I was thinking this would open a terminal based CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about how to cover various scenarios..
So, my goal is to do a 3 part solution
- chatbot quickstart with streamlit -> Because chatbots are popular
- TS multi model app to show TS' full capability - Use this to create video series
- quick start script for common use-cases with curl command -> This can be the getting started guide.
RUN pip install -r /home/model-server/chat_bot/requirements.txt && huggingface-cli login --token $HUGGINGFACE_TOKEN | ||
RUN pip uninstall torchtext torchdata torch torchvision torchaudio -y | ||
RUN pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu --ignore-installed | ||
RUN pip uninstall torchserve torch-model-archiver torch-workflow-archiver -y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This is not needed for this example. Will clean it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ARG MODEL_NAME | ||
ARG HUGGINGFACE_TOKEN | ||
|
||
USER root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we don't have permissions to install things with the default user
|
||
|
||
def start_server(): | ||
os.system("torchserve --start --ts-config /home/model-server/config.properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would show as success even if the the server failed to start, favor subprocess instead, check the return code and then query torchserve directly to see server health as opposed to using sleep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh..good point..let me try. This command was returning immediately. I did try with ping but that was failing as the server was not up yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the logic, but it still doesn't work as expected. There is a slight difference between when the command returns and when the server starts. Add a check with ping , but needs a sleep.
|
||
### What to expect | ||
This launches two streamlit apps | ||
1. TorchServe Server app to start/stop TorchServe, load model, scale up/down workers, configure dynamic batch_size ( Currently llama-cpp-python doesn't support batch_size > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit painful to use llama-cpp here was hopign we could instead showcase an example with export or with mps in eager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few things
- Use HF 7b models with quantization -> Only supported for CUDA
- Use HF 7b models without quantization on CPU -> Extremely slow. No one would use this.
- Docker with MPS -> Seems like this is still not supported. Even pytorch supports only cpu in docker. MPS-Ready, ARM64 Docker Image pytorch#81224
So, currently this seems like the best solution. Seems like some people have tried mistral7b with llama-cpp-python..Its kind of mind blowing that most existing solutions are only targeted for the GPU rich.
Description
This PR enables a new user of TorchServe to quickly launch a chatbot on Mac M1/M2 using TorchServe with 3 commands
Prerequsites:
Fixes #(issue)
Type of change
Please delete options that are not relevant.
Feature/Issue validation/testing
Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test A
Logs for Test A
Test B
Logs for Test B
Checklist: