Llama 3.1 405B with 128k context length #2383

ChristophRaab · 2024-08-09T08:07:30Z

ChristophRaab
Aug 9, 2024

Hi Guys,

i want to host Llama 3.1. 405B and are wondering about hardware requirements and correct TGI settings.

My GPU Budget is a whole DGX (8 x H100 with 80GB each) and i think i am not able to do so.

So launching with:

text-generation-launcher --model-id=hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4 --port=80 --max-best-of=1  --quantize=awq --max-input-tokens=127000 --max-total-tokens=128000`

Results in the following error:

RuntimeError: Not enough memory to handle 127050 prefill tokens. You need to decrease `--max-batch-prefill-tokens`
2024-08-09T07:50:03.586051Z ERROR warmup{max_input_length=127000 max_prefill_tokens=127050 max_total_tokens=128000 max_batch_size=None}:warmup: text_generation_client: router/client/src/lib.rs:46: Server error: CANCELLED

The error message itself is obvious i just wondering, if this is the correct way to do that, because i thought my DGX would be enough.

Any thoughts or ideas on this? Thank you very much!

Best
Christoph

Answered by freegheist

Aug 31, 2024

Basically as I understand TGI cannot support full context length for SOTA models at this point. you can maybe get 40k tokens on a single node even with Llama 3.1 70B, Mistral Large 64k only, etc etc. See here: #2301

View full answer

freegheist · 2024-08-31T18:26:30Z

freegheist
Aug 31, 2024

Basically as I understand TGI cannot support full context length for SOTA models at this point. you can maybe get 40k tokens on a single node even with Llama 3.1 70B, Mistral Large 64k only, etc etc. See here: #2301

1 reply

ChristophRaab Sep 1, 2024
Author

Thank you for the hint, waiting for the fix :).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 405B with 128k context length #2383

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Llama 3.1 405B with 128k context length #2383

ChristophRaab Aug 9, 2024

Replies: 1 comment · 1 reply

freegheist Aug 31, 2024

ChristophRaab Sep 1, 2024 Author

ChristophRaab
Aug 9, 2024

Replies: 1 comment 1 reply

freegheist
Aug 31, 2024

ChristophRaab Sep 1, 2024
Author