Extending TGI benchmarking and documentation #621

jimburtoft · 2024-06-03T15:38:23Z

This PR extends the Readme for TGI benchmarking and adds additional benchmarks.

It includes:
-A note about downloading and tagging TGI image if you aren't building it locally(discussed with @dacorvo in #605 )
-Llama3-70B example on both Inferentia and Trainium.
-instructions on how to compile models not available in the hub (e.g. 32 core Llama3-70B)
-A run_all.sh script that will generate benchmarks for different concurrencies. (if there is an easier way to do this, like passing a list into the benchmark.sh script, I don't know how).
-Does not change the existing examples in case you are using them for any TGI tests.
-Adds some batch size=one metrics

Before submitting

[X ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[X ] Did you make sure to update the documentation with your changes?
[N/A ] Did you write any new necessary tests?

dacorvo

Thank you very much for this pull-request ! I have made a few comments.

benchmark/text-generation-inference/README.md

benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/.env

benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/docker-compose.yaml

benchmark/text-generation-inference/llama3-70b-inf2.48xlarge/tgi-results-batchsize-1.csv

benchmark/text-generation-inference/llama3-70b-trn1.32xlarge/docker-compose.yaml

sync with main

HuggingFaceDocBuilderDev · 2024-06-04T13:58:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

docs/source/guides/export_model.mdx

HF_SEQUENCE_LENGTH settings from .env and docker compose.

dacorvo

LGTM, thanks !

jimburtoft added 11 commits May 31, 2024 17:08

Initial Llama3-70b test

07e19d6

Missing .env file. .gitignore strikes again!

103a805

Adding script to run multiple batch sizes at once

ea72cf6

changed mode to +x on shell script

6f17f9a

fixing my poor bash syntax

3cfaa39

Renaming directory to test on Trainium

b2eaec5

adding trainium

8a7781c

Trainium compose example added

2ab93ae

Readme changes

7092a76

More Readme changes

0b4a078

Adding BS1 numbers

d0d8617

dacorvo reviewed Jun 4, 2024

View reviewed changes

jimburtoft added 2 commits June 4, 2024 09:52

Merge pull request #1 from huggingface/main

ed7ebc7

sync with main

misspelling in export_model.mdx

b980c68

misspelling in benchmark/text-generation-inference/README.md

eedb830

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

dacorvo reviewed Jun 4, 2024

View reviewed changes

docs/source/guides/export_model.mdx Show resolved Hide resolved

jimburtoft added 2 commits June 4, 2024 11:47

Removing redundant HF_BATCH_SIZE and

0fb57ea

HF_SEQUENCE_LENGTH settings from .env and docker compose.

Trainium batch size 8 numbers added.

9e065c8

dacorvo approved these changes Jun 5, 2024

View reviewed changes

dacorvo merged commit af0506f into huggingface:main Jun 5, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extending TGI benchmarking and documentation #621

Extending TGI benchmarking and documentation #621

jimburtoft commented Jun 3, 2024

dacorvo left a comment

HuggingFaceDocBuilderDev commented Jun 4, 2024

dacorvo left a comment

Extending TGI benchmarking and documentation #621

Extending TGI benchmarking and documentation #621

Conversation

jimburtoft commented Jun 3, 2024

Before submitting

dacorvo left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 4, 2024

dacorvo left a comment

Choose a reason for hiding this comment