Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

pytorch / serve Public

Notifications You must be signed in to change notification settings
Fork 863
Star 4.2k

Code
Issues 375
Pull requests 47
Actions
Projects 5
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Use Case: Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton #3276

Merged

agunapal merged 19 commits into master from usecase/rag_based_llm

Aug 2, 2024

Merged

Use Case: Enhancing LLM Serving with Torch Compiled RAG on AWS Graviton #3276

agunapal merged 19 commits into master from usecase/rag_based_llm

Aug 2, 2024

Conversation 31 Commits 19 Checks 12 Files changed

Conversation

Copy link

Collaborator

agunapal commented Aug 1, 2024

Description

This PR shows a use case of TorchServe for GenAI deployment.
The use-case shows how torch.compile can be used on embedding model in RAG to improve throughput and how this RAG endpoint can be used along with an LLM endpoint to improve results from the LLM

Fixes #(issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A
Test B
Logs for Test B

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Sorry, something went wrong.

snadampal reacted with thumbs up emoji

All reactions

👍 1 reaction

agunapal and others added 13 commits

July 29, 2024 23:40


          RAG based LLM usecase

d3e2fd3


          RAG based LLM usecase

4c238a2


          Changes for deploying RAG

fdd8b1b


          Updated README

a022a31


          Added main blog

88d96aa


          Added main blog assets

cc17248


          Added main blog assets

1e46a55


          Added use case to index html

8617d86


          Added benchmark config

7b452a2


          Minor edits to README

5a3bad1


          Added new MD for Gen AI usecases

d469cd4


          Added link to GV3 tutorial

9639a10


          Merge branch 'master' into usecase/rag_based_llm

748096e

agunapal marked this pull request as ready for review

August 1, 2024 21:18

agunapal requested review from mreso and namannandan

August 1, 2024 22:08

mreso approved these changes

View reviewed changes

Copy link

Collaborator

mreso left a comment

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, left some minor to nit comments.

Sorry, something went wrong.

All reactions

examples/usecases/RAG_based_LLM_serving/Deploy.md


		### Download Llama

		Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission

Copy link

Collaborator

mreso Aug 2, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to update this to Llama 3.1 now that its released https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

Sorry, something went wrong.

All reactions

Copy link

Collaborator Author

agunapal Aug 2, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried 3.1. One interesting observation was , when I ask it "What's new with Llama 3.1", it gives an acceptable answer, which shouldn't be possible. :D
So, for this usecase Llama 3 drives home the point better

Sorry, something went wrong.

All reactions

examples/usecases/RAG_based_LLM_serving/Deploy.md


		### Download Llama

		Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission

Copy link

Collaborator

mreso Aug 2, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
            Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to get permission
          
            Follow [this instruction](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) to get permission

Sorry, something went wrong.

All reactions

examples/usecases/RAG_based_LLM_serving/Deploy.md Outdated

+              huggingface-cli login --token $HUGGINGFACE_TOKEN
+              ```bash
+              python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct

Copy link

Collaborator

mreso Aug 2, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
            python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct
          
            python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3.1-8B-Instruct

Sorry, something went wrong.

All reactions

examples/usecases/RAG_based_LLM_serving/Deploy.md

+              ```bash
+              python ../Download_model.py --model_path model --model_name meta-llama/Meta-Llama-3-8B-Instruct
+              ```
+              Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3-8B-Instruct`.

Copy link

Collaborator

mreso Aug 2, 2024

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

      
            Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3-8B-Instruct`.
          
            Model will be saved in the following path, `model/models--meta-llama--Meta-Llama-3.1-8B-Instruct`.

Sorry, something went wrong.

All reactions

examples/usecases/RAG_based_LLM_serving/Deploy.md Outdated Show resolved Hide resolved

examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved

examples/usecases/RAG_based_LLM_serving/README.md Show resolved Hide resolved

examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved

examples/usecases/RAG_based_LLM_serving/README.md Outdated Show resolved Hide resolved

examples/usecases/RAG_based_LLM_serving/rag.yaml Outdated Show resolved Hide resolved

agunapal and others added 6 commits

August 2, 2024 21:45


          Addressed review comments

ea49ed6


          Update examples/usecases/RAG_based_LLM_serving/README.md

3132fe2

Co-authored-by: Matthias Reso <13337103+mreso@users.noreply.github.com>


          Update examples/usecases/RAG_based_LLM_serving/README.md

5faedc5

Co-authored-by: Matthias Reso <13337103+mreso@users.noreply.github.com>


          Addressed review comments

31ae4f8


          Merge branch 'usecase/rag_based_llm' of https://github.com/pytorch/serve

ef64966

 into usecase/rag_based_llm


          Merge branch 'master' into usecase/rag_based_llm

32b3521

agunapal enabled auto-merge

August 2, 2024 22:17

agunapal added this pull request to the merge queue

Merged via the queue into master with commit 3f40180

11 of 12 checks passed

agunapal mentioned this pull request

Adding Multi-Image generation usecase app #3356

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

mreso mreso approved these changes

namannandan Awaiting requested review from namannandan

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

2 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.