Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Case: Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton #3276

Merged
merged 19 commits into from
Aug 2, 2024

Conversation

agunapal
Copy link
Collaborator

@agunapal agunapal commented Aug 1, 2024

Description

This PR shows a use case of TorchServe for GenAI deployment.
The use-case shows how torch.compile can be used on embedding model in RAG to improve throughput and how this RAG endpoint can be used along with an LLM endpoint to improve results from the LLM

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A
    Logs for Test A

  • Test B
    Logs for Test B

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@agunapal agunapal marked this pull request as ready for review August 1, 2024 21:18
Copy link
Collaborator

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left some minor to nit comments.


### Download Llama

Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to update this to Llama 3.1 now that its released https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 3.1. One interesting observation was , when I ask it "What's new with Llama 3.1", it gives an acceptable answer, which shouldn't be possible. :D
So, for this usecase Llama 3 drives home the point better


### Download Llama

Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission
Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) to get permission

huggingface-cli login --token $HUGGINGFACE_TOKEN

```bash
python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct
python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3.1-8B-Instruct

```bash
python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct
```
Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3-8B-Instruct`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3-8B-Instruct`.
Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3.1-8B-Instruct`.

examples/usecases/RAG_based_LLM_serving/Deploy.md Outdated Show resolved Hide resolved
examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved
examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved
examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved
examples/usecases/RAG_based_LLM_serving/rag.yaml Outdated Show resolved Hide resolved
@agunapal agunapal enabled auto-merge August 2, 2024 22:17
@agunapal agunapal added this pull request to the merge queue Aug 2, 2024
Merged via the queue into master with commit 3f40180 Aug 2, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants