-
Notifications
You must be signed in to change notification settings - Fork 863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable building dev docker image with CPP backend support #2976
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a separate Dockerfile for cpp at this moment. Otherwise, the image is too big and too complicate.
@namannandan We should use docker multi stage builds . i.e...don't do any installation in the final build stage. This will make the image size bigger. Please check how this is done with compile-image and production-image in DockerFile |
186f53c
to
b5b10b8
Compare
@namannandan Can you please check the regression failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the file name to Dockerfile.cpp
. From what I've seen in various repos, the convention is Dockerfile.xyz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor remarks to address before merging. Are there plans for a prod version with two staged docker build that copies the binaries and discards the build artifacts?
b9bf629
to
5caf2a2
Compare
Yes, I do have an initial implementation of the multi stage build which copies only the binaries to the final production image, but for GPU support, the CPP backend & dependencies compilation during docker build requires more work and testing. So, targeting only the dev container for the upcoming release. Will create a separate PR for the production container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM see comments
@@ -4,6 +4,24 @@ | |||
* GCC version: gcc-9 | |||
* cmake version: 3.18+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be 3.26.4
``` | ||
cd serve/docker | ||
# For CPU support | ||
./build_image.sh -bt dev -cpp | ||
# For GPU support | ||
./build_image.sh -bt dev -g [-cv cu121|cu118] -cpp | ||
``` | ||
|
||
Start the container and optionally bind mount a build directory into the container to persist build artifacts across container runs | ||
``` | ||
# For CPU support | ||
docker run [-v /path/to/build/dir:/serve/cpp/_build] -it pytorch/torchserve:cpp-dev-cpu /bin/bash | ||
# For GPU support | ||
docker run --gpus all [-v /path/to/build/dir:/serve/cpp/_build] -it pytorch/torchserve:cpp-dev-gpu /bin/bash | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker detailed information should be updated in docker README to make sure information centralized. Here we can provide link to docker readme
ARG CMAKE_VERSION=3.26.4 | ||
ARG BRANCH_NAME="master" | ||
ARG USE_CUDA_VERSION="" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcc version should be set.
|
||
# Enable installation of recent cmake release | ||
# Ref: https://apt.kitware.com/ | ||
RUN (wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /usr/share/keyrings/kitware-archive-keyring.gpg >/dev/null) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is $CMAKE_VERSION used in the installation? It seems that the latest cmake is installed in this command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cmake version is pinned here:
Lines 63 to 66 in 3694b11
# Pin cmake and cmake-data version | |
# Ref: https://manpages.ubuntu.com/manpages/xenial/man5/apt_preferences.5.html | |
RUN echo "Package: cmake\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake | |
RUN echo "Package: cmake-data\nPin: version $CMAKE_VERSION*\nPin-Priority: 1001" > /etc/apt/preferences.d/cmake-data |
So when we install cmake here:
serve/ts_scripts/install_dependencies.py
Line 17 in 3694b11
"cmake", |
it will install cmake
3.26.4
RUN apt-get update && \ | ||
if echo "$BASE_IMAGE" | grep -q "cuda:"; then \ | ||
if [ "$USE_CUDA_VERSION" = "cu121" ]; then \ | ||
apt-get -y install cuda-toolkit-12-1; \ | ||
elif [ "$USE_CUDA_VERSION" = "cu118" ]; then \ | ||
apt-get -y install cuda-toolkit-11-8; \ | ||
else \ | ||
echo "Cuda version not supported by CPP backend: $USE_CUDA_VERSION"; \ | ||
exit 1; \ | ||
fi; \ | ||
fi \ | ||
&& rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this part should be replaced with install_dependency (ref: https://github.com/pytorch/serve/blob/master/docker/Dockerfile.dev#L71)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional packages aside from the ones installed by install_depenedencies.py
are required. Following are the error traces I see, if the cuda-toolkit
is not installed:
[2024-03-11 23:50:09,812] torch._inductor.utils: [WARNING] not enough SMs to use max_autotune_gemm mode
In file included from /home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/model.h:15,
from /home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/model_container.h:13,
from /serve/examples/cpp/aot_inductor/bert/cly46ndmfcer53dv4xkdyjmpl3mc7277slw3od3ue5ygudxcb766.cpp:2:
/home/venv/lib/python3.9/site-packages/torch/include/torch/csrc/inductor/aoti_runtime/device_utils.h:14:10: fatal error: cuda.h: No such file or directory
14 | #include <cuda.h>
| ^~~~~~~~
compilation terminated.
Installing cuda-compiler
resolves the above issue, but we'll then run into:
-- Found Threads: TRUE
CMake Error at /home/venv/lib/python3.9/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:70 (message):
Failed to find nvToolsExt
Call Stack (most recent call first):
/home/venv/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:87 (include)
/home/venv/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
CMakeLists.txt:25 (find_package)
Instead of manually having to install all the necessary packages individually, it is cleaner to install the cuda-toolkit
.
Description
Docker container with CPP backend support on CPU
docker_build_log.txt
Docker container with CPP backend support on GPU
docker_build_gpu_log.txt
Fixes #2908
Type of change
Feature/Issue validation/testing
install_dependencies_log.txt
cpp_build_log.txt
install_from_src_logs.txt
ts_log.log
install_dependencies_gpu_log.txt
cpp_build_gpu_log.txt
install_from_src_gpu_logs.txt
ts_log.log