Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

Closed
ZJLi2013 opened this issue Aug 22, 2024 · 3 comments
Closed

[Question]: engine build hyper-paramters for llama-3-70B sample #2139

ZJLi2013 opened this issue Aug 22, 2024 · 3 comments
Assignees
Labels
question Further information is requested triaged Issue has been triaged by maintainers

Comments

@ZJLi2013
Copy link

ZJLi2013 commented Aug 22, 2024

when to use the model config and max_num_tokens, max_batch_size from meta-llama-3-70B to build engine, while I am facing oom when runing the engine:

I am building engine like this:

tp max_num_tokens max_bs
8 16384 8192
 trtllm-build --model_config $model_cfg \
                --use_fused_mlp enable \
                --gpt_attention_plugin float16 \
                --output_dir $engine_dir \
                --workers 8 \
                --max_batch_size $max_batch_size \
                --max_num_tokens $max_num_tokens \
                --max_input_len 2048 \
                --max_seq_len 4096 \
                --use_paged_context_fmha enable \
                --multiple_profiles enable     

it can build done, while it reports oom when run gptManagerBench with inputlen 128, output 128 case.

also tried to small num_tokens=4094, and max_bs from 1024 to 256, but all these configs give runtime oom.

so wonder what's the workable hyper-parameters used in this sample ?

Thanks

@ZJLi2013
Copy link
Author

ok, looks the latest 0.13 is bugy, downgrade to 0.12, runs well.

@ZJLi2013
Copy link
Author

ZJLi2013 commented Sep 3, 2024

looks llama-3.1 feature support, only merged in main branch, so can't work with trtllm 0.12.0 branch for llama-3.1 models ? can the expert teams help to check out ?

@ZJLi2013 ZJLi2013 reopened this Sep 3, 2024
@lfr-0531
Copy link
Collaborator

lfr-0531 commented Sep 4, 2024

TensorRT-LLM v0.12.0 can support llama-3.1 models, please refer to the llama examples: https://github.com/NVIDIA/TensorRT-LLM/tree/v0.12.0/examples/llama#run-llama-31-405b-model

@lfr-0531 lfr-0531 self-assigned this Sep 4, 2024
@lfr-0531 lfr-0531 added question Further information is requested triaged Issue has been triaged by maintainers labels Sep 4, 2024
@lfr-0531 lfr-0531 closed this as completed Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants