You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
looks llama-3.1 feature support, only merged in main branch, so can't work with trtllm 0.12.0 branch for llama-3.1 models ? can the expert teams help to check out ?
when to use the model config and
max_num_tokens
,max_batch_size
from meta-llama-3-70B to build engine, while I am facing oom when runing the engine:I am building engine like this:
it can build done, while it reports oom when run gptManagerBench with inputlen 128, output 128 case.
also tried to small num_tokens=4094, and max_bs from 1024 to 256, but all these configs give runtime oom.
so wonder what's the workable hyper-parameters used in this sample ?
Thanks
The text was updated successfully, but these errors were encountered: