Example for Llama2 on Inf2 #2458

namannandan · 2023-07-12T03:47:19Z

Description

This PR adds an example that details the steps to compile and run the Llama2 model on Inferentia2 for text completion with micro batching and response streaming support.

Model: https://huggingface.co/meta-llama/Llama-2-13b-hf
Instance type: inf2.24xlarge

Type of change

New feature (non-breaking change which adds functionality)

Test

$ torchserve --ncs --start --model-store model_store
$ curl -X POST "http://localhost:8081/models?url=llama-2-13b" 
$ python test_stream_response.py 
Today the weather is really nice and I am planning on going to the beach. I am going to take my camera and take some pictures. I am going to take pictures of the beach and the ocean. I am going to 
take pictures of the people and the animals. I am going to take pictures of the sun and the sky. I am going to take pictures of the sand and the water. I am going to take pictures of the waves and 
the birds. I am going to take pictures of the shells and the rocks. I am going to

codecov · 2023-07-12T04:06:34Z

Codecov Report

Merging #2458 (bbcdaf2) into master (80b1679) will decrease coverage by 0.59%.
The diff coverage is 12.12%.

❗ Current head bbcdaf2 differs from pull request most recent head cfaf385. Consider uploading reports for the commit cfaf385 to get more accurate results

@@            Coverage Diff             @@
##           master    #2458      +/-   ##
==========================================
- Coverage   70.87%   70.29%   -0.59%     
==========================================
  Files          83       84       +1     
  Lines        3839     3871      +32     
  Branches       58       58              
==========================================
  Hits         2721     2721              
- Misses       1114     1146      +32     
  Partials        4        4

Files Changed	Coverage Δ
ts/handler_utils/hf_batch_streamer.py	`0.00% <0.00%> (ø)`
ts/handler_utils/micro_batching.py	`90.29% <80.00%> (-0.62%)`	⬇️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

lxning

save_split_checkpoints.py is common. Please move it to large_models/util/ and rename it as inf2_save_split_checkpoints.py.

examples/large_models/inferentia2/llama/Readme.md

examples/large_models/inferentia2/opt/Readme.md

examples/large_models/inferentia2/llama2/inf2_handler.py

examples/large_models/inferentia2/llama2/Readme.md

examples/large_models/inferentia2/llama2/inf2_handler.py

examples/large_models/inferentia2/llama2/Readme.md

examples/large_models/inferentia2/util/hf_batch_streamer.py

examples/large_models/inferentia2/util/inf2_save_split_checkpoints.py

lxning · 2023-09-05T19:47:37Z

Please add feature AOT precompile and store model in cache in this example

update README
replace inf2_handler.py wtih the new one

… size

namannandan · 2023-09-18T16:12:05Z

Issue to track follow up tasks on this PR: #2600

examples/large_models/inferentia2/llama2/inf2_handler.py

examples/large_models/inferentia2/llama2/Readme.md

ts/handler_utils/hf_batch_streamer.py

examples/large_models/inferentia2/llama2/Readme.md

HamidShojanazeri · 2023-09-19T14:29:19Z

examples/large_models/inferentia2/llama2/inf2_handler.py

+            tp_degree=tp_degree,
+        )
+        logger.info("Starting to compile the model")
+        self.model.to_neuron()


@namannandan I am wondering if compilation can be done a head of time and we just load the compiled graphs here the way it was working for inf1?

I tested the _save_compiled_artifacts . It is able to generate a neuron model. However, the transformers_neuronx still needs to recompile. I already let Neuron team know they need more work on the experimental feature _save_compiled_artifacts.

Addressed review comments. Follow up tasks tracked here: #2600

namannandan marked this pull request as ready for review July 14, 2023 18:19

namannandan changed the title ~~Llama on Inf2 example~~ Example for Llama on Inf2 Jul 14, 2023

namannandan requested review from HamidShojanazeri and lxning July 14, 2023 18:20

lxning reviewed Jul 14, 2023

View reviewed changes

lxning reviewed Jul 17, 2023

View reviewed changes

examples/large_models/inferentia2/llama/Readme.md Outdated Show resolved Hide resolved

lxning reviewed Jul 17, 2023

View reviewed changes

examples/large_models/inferentia2/opt/Readme.md Outdated Show resolved Hide resolved

namannandan force-pushed the naman-inf2-example-refactor branch 2 times, most recently from 0dd2d87 to b9f2654 Compare July 27, 2023 21:42

namannandan changed the title ~~Example for Llama on Inf2~~ Example for Llama2 on Inf2 Jul 27, 2023

namannandan force-pushed the naman-inf2-example-refactor branch 2 times, most recently from 631b253 to 5e8b713 Compare July 28, 2023 17:46

namannandan requested a review from lxning July 28, 2023 17:57

lxning reviewed Jul 29, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/inf2_handler.py Outdated Show resolved Hide resolved

namannandan requested a review from lxning August 1, 2023 06:19

ashivadi reviewed Aug 2, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/Readme.md Outdated Show resolved Hide resolved

ashivadi reviewed Aug 2, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/Readme.md Show resolved Hide resolved

ashivadi reviewed Aug 2, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/Readme.md Outdated Show resolved Hide resolved

HamidShojanazeri reviewed Aug 4, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/inf2_handler.py Outdated Show resolved Hide resolved

namannandan requested review from mreso, HamidShojanazeri and ashivadi August 16, 2023 18:16

lxning reviewed Aug 16, 2023

View reviewed changes

namannandan changed the title ~~Example for Llama2 on Inf2~~ [WIP] Example for Llama2 on Inf2 Aug 24, 2023

msaroufim added the example label Aug 25, 2023

namannandan force-pushed the naman-inf2-example-refactor branch from 549ee49 to a214142 Compare August 26, 2023 01:02

namannandan requested review from agunapal and msaroufim as code owners September 6, 2023 19:49

Naman Nandan added 6 commits September 13, 2023 14:47

Refactor microbatch implementation to only handle batch index

f50ac63

fix requirements package versions

9af1611

Add support for neuronx compilation artifact caching

b0392cc

Update Readme to include more details about batch size and microbatch…

5ac4696

… size

add config.properties file

3da4a78

Update Readme to include details about AOT compilation

b747cd3

namannandan force-pushed the naman-inf2-example-refactor branch from 6aeb67b to b747cd3 Compare September 13, 2023 21:47

Naman Nandan added 2 commits September 13, 2023 17:44

Refactor custom handler

7c1b130

add file links in readme

80eb640

namannandan mentioned this pull request Sep 18, 2023

Llama2 on Inf2 example tests and documentation #2600

Closed

Merge branch 'master' into naman-inf2-example-refactor

e55fd86

namannandan changed the title ~~[WIP] Example for Llama2 on Inf2~~ Example for Llama2 on Inf2 Sep 18, 2023

HamidShojanazeri reviewed Sep 18, 2023

View reviewed changes

examples/large_models/inferentia2/llama2/inf2_handler.py Outdated Show resolved Hide resolved

examples/large_models/inferentia2/llama2/inf2_handler.py Show resolved Hide resolved

examples/large_models/inferentia2/llama2/Readme.md Show resolved Hide resolved

Naman Nandan added 2 commits September 18, 2023 12:45

Add more details in Readme about using precompiled model artifacts

36b0d96

fix linter error

5176dce

lxning approved these changes Sep 19, 2023

View reviewed changes

ts/handler_utils/hf_batch_streamer.py Show resolved Hide resolved

examples/large_models/inferentia2/llama2/Readme.md Outdated Show resolved Hide resolved

lxning enabled auto-merge September 19, 2023 03:51

HamidShojanazeri reviewed Sep 19, 2023

View reviewed changes

Include model checkpoint files in a separate directory

a983a01

namannandan requested review from HamidShojanazeri, mreso and chauhang September 19, 2023 18:27

Merge branch 'master' into naman-inf2-example-refactor

cfaf385

lxning added this pull request to the merge queue Sep 19, 2023

Merged via the queue into pytorch:master with commit d0ae857 Sep 19, 2023
11 of 12 checks passed

sachanub mentioned this pull request Sep 23, 2023

TorchServe Nightly Benchmarks Failures #2616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example for Llama2 on Inf2 #2458

Example for Llama2 on Inf2 #2458

namannandan commented Jul 12, 2023 •

edited

Loading

codecov bot commented Jul 12, 2023 •

edited

Loading

lxning left a comment

lxning commented Sep 5, 2023

namannandan commented Sep 18, 2023 •

edited

Loading

HamidShojanazeri Sep 19, 2023

lxning Sep 19, 2023

Example for Llama2 on Inf2 #2458

Example for Llama2 on Inf2 #2458

Conversation

namannandan commented Jul 12, 2023 • edited Loading

Description

Type of change

Test

codecov bot commented Jul 12, 2023 • edited Loading

Codecov Report

lxning left a comment

Choose a reason for hiding this comment

lxning commented Sep 5, 2023

namannandan commented Sep 18, 2023 • edited Loading

HamidShojanazeri Sep 19, 2023

Choose a reason for hiding this comment

lxning Sep 19, 2023

Choose a reason for hiding this comment

namannandan commented Jul 12, 2023 •

edited

Loading

codecov bot commented Jul 12, 2023 •

edited

Loading

namannandan commented Sep 18, 2023 •

edited

Loading