-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed to load whisper decoder engine with paged kv cache #1930
Comments
@MahmoudAshraf97 I am investigating the issue now and would update here later. |
Reproduced with |
@MahmoudAshraf97 Yeah, the fixed codes have not been merged into main yet. Let me tell you here once it got merged. |
Faced same issue with
ENV CUDA_VERSION 12.6 |
For my case, it seems like the issue happens at here:
the The right way should be something like the model_runner config builder here:
However, the root cause seems should be here: https://github.com/NVIDIA/TensorRT-LLM/blob/main/benchmarks/python/gpt_benchmark.py#L83 this somehow set the string I put a small pr change here: #2219 let me know if it's good to check in. Have some other parts to fix in benchmarking as well (such as the quantization etc.) I'll try to see if could can fix them locally first. |
Many thanks! This works for me. I faced similar error when calling benchmarks/python/benchmark.py in this command In my case, the error code report this
And with this minor change locally (@qingquansong 's pr), it solved the problem:
|
not reproducable in |
System Info
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Build using the official example instructions and switch
remove_input_padding
andpaged_kv_cache
toenable
then load the model using the class in
run.py
Expected behavior
The model should load fine
actual behavior
additional notes
I build with kv cache enabled to use in-flight batching, it's not in a usable state for now but this is for another issue
check #1909
The text was updated successfully, but these errors were encountered: