-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add trtllm container update #2191
Conversation
@@ -68,7 +68,7 @@ COPY distribution[s]/ ./ | |||
RUN mv *.deb djl-serving_all.deb || true | |||
|
|||
# Install CUDNN 8 | |||
RUN apt-get update && apt-get install -y --no-install-recommends libcudnn8 && rm -rf /var/lib/apt/lists/* | |||
RUN apt-get update && apt-get install -y --no-install-recommends libcudnn9-cuda-12 && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use 12.4.1-cudnn-devel-ubuntu22.04
image directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also installed libcudnn9-dev-cuda-12, which is around 1GB we don't need that
ARG cuda_python_version=12.4 | ||
ARG peft_version=0.10.0 | ||
ARG triton_version=r24.04 | ||
ARG trtllm_toolkit_wheel="https://publish.djl.ai/tensorrt-llm/toolkit/tensorrt_llm_toolkit-${trtllm_toolkit_version}-py3-none-any.whl" | ||
ARG trtllm_wheel="https://djl-ai.s3.amazonaws.com/publish/tensorrt-llm/${trtllm_version}/tensorrt_llm-0.10.0-cp310-cp310-linux_x86_64.whl" | ||
ARG trtllm_wheel="https://publish.djl.ai/tensorrt-llm/${trtllm_version}/tensorrt_llm-0.11.0-cp310-cp310-linux_x86_64.whl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ydm-amazon this will allow us to use cloudfront (an AWS services) that do global CDN on the artifacts we have and provide 10x speed up on downloading. Using raw HTTP url for S3 will be slower than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's awesome
Description
Work in progress, will test a build
https://github.com/deepjavalibrary/djl-serving/actions/runs/9984782888