Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add trtllm container update #2191

Merged
merged 2 commits into from
Jul 18, 2024
Merged

add trtllm container update #2191

merged 2 commits into from
Jul 18, 2024

Conversation

lanking520
Copy link
Contributor

@lanking520 lanking520 commented Jul 17, 2024

Description

Work in progress, will test a build

https://github.com/deepjavalibrary/djl-serving/actions/runs/9984782888

@lanking520 lanking520 changed the title [WIP] add trtllm container update add trtllm container update Jul 18, 2024
@@ -68,7 +68,7 @@ COPY distribution[s]/ ./
RUN mv *.deb djl-serving_all.deb || true

# Install CUDNN 8
RUN apt-get update && apt-get install -y --no-install-recommends libcudnn8 && rm -rf /var/lib/apt/lists/*
RUN apt-get update && apt-get install -y --no-install-recommends libcudnn9-cuda-12 && rm -rf /var/lib/apt/lists/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use 12.4.1-cudnn-devel-ubuntu22.04 image directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also installed libcudnn9-dev-cuda-12, which is around 1GB we don't need that

ARG cuda_python_version=12.4
ARG peft_version=0.10.0
ARG triton_version=r24.04
ARG trtllm_toolkit_wheel="https://publish.djl.ai/tensorrt-llm/toolkit/tensorrt_llm_toolkit-${trtllm_toolkit_version}-py3-none-any.whl"
ARG trtllm_wheel="https://djl-ai.s3.amazonaws.com/publish/tensorrt-llm/${trtllm_version}/tensorrt_llm-0.10.0-cp310-cp310-linux_x86_64.whl"
ARG trtllm_wheel="https://publish.djl.ai/tensorrt-llm/${trtllm_version}/tensorrt_llm-0.11.0-cp310-cp310-linux_x86_64.whl"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ydm-amazon this will allow us to use cloudfront (an AWS services) that do global CDN on the artifacts we have and provide 10x speed up on downloading. Using raw HTTP url for S3 will be slower than that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's awesome

@lanking520 lanking520 merged commit ec7732d into master Jul 18, 2024
18 checks passed
@lanking520 lanking520 deleted the trtllm branch July 18, 2024 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants