[Question]: engine build hyper-paramters for llama-3-70B sample #2139

ZJLi2013 · 2024-08-22T07:05:54Z

when to use the model config and max_num_tokens, max_batch_size from meta-llama-3-70B to build engine, while I am facing oom when runing the engine:

I am building engine like this:

tp	max_num_tokens	max_bs
8	16384	8192

 trtllm-build --model_config $model_cfg \
                --use_fused_mlp enable \
                --gpt_attention_plugin float16 \
                --output_dir $engine_dir \
                --workers 8 \
                --max_batch_size $max_batch_size \
                --max_num_tokens $max_num_tokens \
                --max_input_len 2048 \
                --max_seq_len 4096 \
                --use_paged_context_fmha enable \
                --multiple_profiles enable

it can build done, while it reports oom when run gptManagerBench with inputlen 128, output 128 case.

also tried to small num_tokens=4094, and max_bs from 1024 to 256, but all these configs give runtime oom.

so wonder what's the workable hyper-parameters used in this sample ?

Thanks

The text was updated successfully, but these errors were encountered:

ZJLi2013 · 2024-08-22T09:28:59Z

ok, looks the latest 0.13 is bugy, downgrade to 0.12, runs well.

ZJLi2013 · 2024-09-03T02:46:39Z

looks llama-3.1 feature support, only merged in main branch, so can't work with trtllm 0.12.0 branch for llama-3.1 models ？ can the expert teams help to check out ?

lfr-0531 · 2024-09-04T03:55:01Z

TensorRT-LLM v0.12.0 can support llama-3.1 models, please refer to the llama examples: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0/examples/llama#run-llama-31-405b-model

ZJLi2013 closed this as completed Aug 22, 2024

ZJLi2013 reopened this Sep 3, 2024

lfr-0531 self-assigned this Sep 4, 2024

lfr-0531 added question Further information is requested triaged Issue has been triaged by maintainers labels Sep 4, 2024

lfr-0531 closed this as completed Sep 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

ZJLi2013 commented Aug 22, 2024 •

edited

Loading

ZJLi2013 commented Aug 22, 2024

ZJLi2013 commented Sep 3, 2024

lfr-0531 commented Sep 4, 2024

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

Comments

ZJLi2013 commented Aug 22, 2024 • edited Loading

ZJLi2013 commented Aug 22, 2024

ZJLi2013 commented Sep 3, 2024

lfr-0531 commented Sep 4, 2024

ZJLi2013 commented Aug 22, 2024 •

edited

Loading