Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example llama3 on inf2 #3133

Merged
merged 6 commits into from
May 8, 2024
Merged

Example llama3 on inf2 #3133

merged 6 commits into from
May 8, 2024

Conversation

lxning
Copy link
Collaborator

@lxning lxning commented May 4, 2024

Description

Please read our CONTRIBUTING.md prior to creating your first pull request.

Please include a summary of the feature or issue being fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test continuous batching
python examples/large_models/utils/test_llm_streaming_response.py -m llama-3-70b -o 50 -t 2 -n 4 --prompt-text "Today the weather is really nice and I am planning on " --prompt-randomize
Tasks are completed
payload={'prompt': 'q a k h f n u u p w d l k w g q s Today the weather is really nice and I am planning on ', 'max_new_tokens': 67}
, output=q a k h f n u u p w d l k w g q s Today the weather is really nice and I am planning on  1) ___________ to the beach with my friends. We are going to 2) ___________ a picnic and 3) ___________ some games. I am going to 4) ___________ my camera so I can take some pictures. I am also going to 5) ___________ my sunglasses and

payload={'prompt': 'i y b n d d n k l o r j j o x c u Today the weather is really nice and I am planning on ', 'max_new_tokens': 67}
, output=i y b n d d n k l o r j j o x c u Today the weather is really nice and I am planning on  1) (go) to the beach with my friends. We are going to 2) (take) a picnic and 3) (have) a great time. I am going to 4) (wear) my new bikini and 5) (lie) on the beach all day. I am going to 6

payload={'prompt': 'a x y w x Today the weather is really nice and I am planning on ', 'max_new_tokens': 55}
, output=a x y w x Today the weather is really nice and I am planning on  1. going to the beach. 2. going to the park. 3. going to the cinema. 4. going to the zoo. 5. going to the museum. 6. going to the theatre. 7. going to the swimming pool

payload={'prompt': 't z c n j o t i o h z n s r f Today the weather is really nice and I am planning on ', 'max_new_tokens': 65}
, output=t z c n j o t i o h z n s r f Today the weather is really nice and I am planning on  1) going to the beach. I am going to take my 2) camera with me. I am going to take some 3) pictures of the 4) sea and the 5) sand. I am going to take my 6) sunglasses with me because the sun is really 7) bright.

payload={'prompt': 'p d Today the weather is really nice and I am planning on ', 'max_new_tokens': 52}
, output=p d Today the weather is really nice and I am planning on   going to the beach. I am going to the beach with my family. I am going to the beach with my family because I want to have fun with them. I am going to the beach with my family because I want to have fun with them.

payload={'prompt': 'v l j d c h Today the weather is really nice and I am planning on ', 'max_new_tokens': 56}
, output=v l j d c h Today the weather is really nice and I am planning on  2 things. First, I am going to go to the park and play some basketball. Then, I am going to go to the mall and buy some new clothes. I am going to buy a new pair of shoes, a new shirt, and a new pair of pants.

payload={'prompt': 'v m w i s x x x w l g c Today the weather is really nice and I am planning on ', 'max_new_tokens': 62}
, output=v m w i s x x x w l g c Today the weather is really nice and I am planning on  1) going to the beach 2) going to the park 3) going to the mall 4) going to the movies 5) going to the zoo 6) going to the museum 7) going to the library 8) going to the park 9) going to the mall

payload={'prompt': 'e c k e l b j p j s Today the weather is really nice and I am planning on ', 'max_new_tokens': 60}
, output=e c k e l b j p j s Today the weather is really nice and I am planning on  1. going to the beach 2. going to the park 3. going to the mall 4. going to the movies 5. going to the zoo 6. going to the library 7. going to the museum 8. going to the park 9. going to
  • Test microbatch + streamer
python examples/large_models/utils/test_llm_streaming_response.py -m llama-3-70b -o 50 -t 2 -n 4 --prompt-text "Today the weather is really nice and I am planning on "
Tasks are completed
payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

payload={'prompt': 'Today the weather is really nice and I am planning on ', 'max_new_tokens': 50}
, output=Today the weather is really nice and I am planning on  going to the beach. I am going to take my camera and take some pictures. I am also going to take my sketchbook and draw some pictures. I am going to take my sketchbook and draw some pictures. I am going to take

Checklist:

  • Did you have fun?
  • Have you added tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?

@lxning lxning added the example label May 4, 2024
@lxning lxning added this to the v0.11.0 milestone May 4, 2024
@lxning lxning requested review from mreso and agunapal May 4, 2024 02:40
@lxning lxning self-assigned this May 4, 2024
Copy link
Collaborator

@mreso mreso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some comments, please address the move of the neuron handlers into example folder before merging.

@@ -1,6 +1,6 @@
# Large model inference on Inferentia2

This folder briefs on serving the [Llama 2](https://huggingface.co/meta-llama) model on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
This folder briefs on serving the [Llama 2 and Llama 3](https://huggingface.co/meta-llama) model a on [AWS Inferentia2](https://aws.amazon.com/ec2/instance-types/inf2/) for text completion with TorchServe's features:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...model on an AWS Inferentia2..."?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like something went wrong with your previous PR where you said you moved the handlers into example dir. Please make sure the file in ts/torch_handler/distributed/ gets cleaned up. My assumption is that this file got moved here. Please clarify if there where changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous PR is about the microbatching+streamer. Here is about continuous batching. They are two different base handlers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, not sure if we're referring to the same PR. I meant #3035 which touched ts/torch_handler/distributed/base_neuronx_continuous_batching_handler.py as well as files under examples/large_models/inferentia2/llama2/continuous_batching so I assumed that we were talking about moving the cb_handler as well. Anyways, please make sure to remove ts/torch_handler/distributed/base_neuronx_continuous_batching_handler.py with this pr.

tp_degree: 24
max_length: 256
max_new_tokens: 50
batch_size: 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was paged attention supported in inferentia? Does batch size of 8 give enough flexibility in that case? Would be good to discuss this in the documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will run benchmark and update batch size.

@lxning lxning enabled auto-merge May 8, 2024 03:58
@mreso mreso disabled auto-merge May 8, 2024 04:25
@lxning lxning added this pull request to the merge queue May 8, 2024
Merged via the queue into master with commit 0b4539f May 8, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants