Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make torch available #520

Merged
merged 31 commits into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
aba74b7
update(pytorch): remove virtual env
Sep 6, 2023
615dafb
update(pytorch): remove virtual env
Sep 6, 2023
d492f92
update(cpu, pytorch): mamba install pytorch to base
Sep 8, 2023
211896c
update(pytorch): adjust torch installation
Sep 8, 2023
bc43ff8
update(pytorch): add ipykernel and conda env
Sep 8, 2023
3230cfe
update(pytorch): remove CUDA
Sep 8, 2023
c24dcb8
update(pytorch): add ipykernel
Sep 8, 2023
fd3456e
update(pytorch): add gputil
Sep 11, 2023
1debe15
update(tensorflow): add cuda to mamba command
Sep 11, 2023
1126a63
update(tensorflow): add cuda to mamba command
Sep 11, 2023
ad4869f
update(tensorflow): remove tensorflow-gpu
Sep 11, 2023
4680033
update(gpu-notebooks): remove conda env
Sep 11, 2023
34fc30f
update(cpu, pytorch, tensorflow): consistency
Sep 11, 2023
4348cc3
update(test_tensorflow): use tensorflow env
Sep 12, 2023
a69f041
update(test_packages): add gputil to exclude list
Sep 12, 2023
92990db
update(test_packages): add cudnn, cudatoolkit to exclude list
Sep 12, 2023
39688b0
update(pytorch, tensorflow): ipykernel install
Sep 12, 2023
c3e888d
revert(cpu): fix cpu conda env
Sep 12, 2023
271649d
update(tests): gpu available
Sep 12, 2023
08b5969
update(makefile): restore tensorflow build
Sep 12, 2023
f369386
update(tests): remove GPU test
Sep 12, 2023
fbc9b7f
update(jupyterlab): jupyter-dash caused build fail
Sep 12, 2023
7ecd7c8
update(PR): based on comments
Sep 12, 2023
229e6a1
update(rstudio): remove pin on tidymodels
Sep 12, 2023
7d805db
update(get-nvidia-stuff): 1804 to 2204
Sep 12, 2023
2032b25
Merge branch 'update-base-image-to-22.04' into make-torch-available
bryanpaget Sep 18, 2023
899728d
revert(2_tensorflow): prev working configuration
bryanpaget Sep 18, 2023
d184caf
update(0_Rocker): remove whitespace delta
bryanpaget Sep 18, 2023
cc1909d
update(2_tensorflow): new line
bryanpaget Sep 18, 2023
ab4100e
Update test_tensorflow.py: revert test
bryanpaget Sep 19, 2023
3740283
Merge branch 'update-base-image-to-22.04' into make-torch-available
bryanpaget Sep 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Makefile
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the CUDA drivers from the PyTorch and Tensorflow Dockerbits because those are handled by pytorch-cuda=11.8 in pytorch.

Also the rstudio-server docker-bit has been added due to rstudio-server being broken out into its own docker-bit.

Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ generate-dockerfiles: clean jupyterlab rstudio remote-desktop sas docker-stacks-
pytorch tensorflow: .output
$(CAT) \
$(SRC)/0_cpu.Dockerfile \
$(SRC)/1_CUDA-$($(@)-CUDA).Dockerfile \
$(SRC)/2_$@.Dockerfile \
> $(TMP)/$@.Dockerfile

Expand Down
20 changes: 6 additions & 14 deletions docker-bits/2_cpu.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,17 +1,9 @@
# Create conda environment (CPU only) with many useful packages.

RUN conda create -n pycpu --yes \
python==3.11.0 ipython==8.11.0 sphinx==6.1.3 \
boto==2.49.0 s3fs==2023.3.0 \
dos2unix==7.4.1 parallel==20230122 \
dask==2023.3.0 numpy==1.24.2 pandas==1.5.3 pyarrow==11.0.0 scipy==1.10.1 \
scikit-learn==1.2.2 xgboost==1.7.1 \
matplotlib==3.7.1 pillow==9.4.0 \
gdal==3.6.2 geopandas==0.12.2 rasterio==1.3.6 \
opencv==4.7.0 scikit-image==0.19.3 \
gensim==4.3.0 nltk==3.8.1 spacy==3.5.0 \
pytorch==1.13.1 torchaudio==0.13.1 torchvision==0.14.1 cpuonly==2.0 \
-c pytorch -c conda-forge && \
conda clean --all -f -y && \
RUN mamba install pytorch \
torchvision \
torchaudio \
cpuonly \
-c pytorch && \
mamba clean --all -f -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER
30 changes: 19 additions & 11 deletions docker-bits/2_pytorch.Dockerfile
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are just to update the torch virtual environment, install the required pacakges including pytorch-cuda=11.8 which handles the CUDA drivers.

Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
#Install PyTorch
RUN conda create -n torch python=3.9 && \
conda install -n torch --quiet --yes -c pytorch \
'pytorch==1.13.1' \
'torchvision==0.14.1' \
'ipykernel==6.21.3' \
'torchtext==0.14.1' \
&& \
conda clean --all -f -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER
# Install PyTorch GPU Packages and enable PyTorch IPyKernel
RUN mamba create -n torch python=3.11 && \
mamba install -n torch --quiet --yes -c pytorch -c nvidia \
ipykernel \
pytorch \
torchvision \
torchaudio \
# gputil has nvidia-smi
gputil \
# pytorch-cuda are the nvidia cuda drivers
pytorch-cuda=11.8 \
&& \
conda clean --all -f -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER && \
source activate torch && \
python -m ipykernel install --user --name torch --display-name "PyTorch"


143 changes: 18 additions & 125 deletions output/jupyterlab-pytorch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -31,135 +31,28 @@ RUN pip install --force-reinstall cryptography==39.0.1 && \
fix-permissions /home/$NB_USER

###############################
### docker-bits/1_CUDA-11.8.0.Dockerfile
### docker-bits/2_pytorch.Dockerfile
###############################

# Cuda stuff for v11.8.0

## https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/base/Dockerfile

###########################
### Base
###########################

ENV NVARCH x86_64

ENV NVIDIA_REQUIRE_CUDA "cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516"
ENV NV_CUDA_CUDART_VERSION 11.8.89-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-8
ENV OS_VER ubuntu2204

ARG TARGETARCH

RUN apt-get update && apt-get install -y --no-install-recommends \
gnupg2 curl ca-certificates && \
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/${OS_VER}/${NVARCH}/3bf863cc.pub | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/nvidia.gpg && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS_VER}/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
apt-get purge --autoremove -y curl \
&& rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 11.8.0

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-cudart-11-8=${NV_CUDA_CUDART_VERSION} \
${NV_CUDA_COMPAT_PACKAGE} \
&& rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
&& echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV CUDA_DIR "/usr/local/cuda"
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:$CUDA_DIR/lib64
ENV XLA_FLAGS "--xla_gpu_cuda_data_dir=$CUDA_DIR"

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

# ###########################
# ### Devel
# ###########################
# # https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/devel/Dockerfile
#
# $(curl -s https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/devel/Dockerfile)

###########################
### Runtime
###########################
# https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/runtime/Dockerfile

ENV NV_CUDA_LIB_VERSION 11.8.0-1

ENV NV_NVTX_VERSION 11.8.86-1
ENV NV_LIBNPP_VERSION 11.8.0.86-1
ENV NV_LIBNPP_PACKAGE libnpp-11-8=${NV_LIBNPP_VERSION}
ENV NV_LIBCUSPARSE_VERSION 11.7.5.86-1

ENV NV_LIBCUBLAS_PACKAGE_NAME libcublas-11-8
ENV NV_LIBCUBLAS_VERSION 11.11.3.6-1
ENV NV_LIBCUBLAS_PACKAGE ${NV_LIBCUBLAS_PACKAGE_NAME}=${NV_LIBCUBLAS_VERSION}

ENV NV_LIBNCCL_PACKAGE_NAME libnccl2
ENV NV_LIBNCCL_PACKAGE_VERSION 2.15.5-1
ENV NCCL_VERSION 2.15.5-1
ENV NV_LIBNCCL_PACKAGE ${NV_LIBNCCL_PACKAGE_NAME}=${NV_LIBNCCL_PACKAGE_VERSION}+cuda11.8

ARG TARGETARCH

RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-libraries-11-8=${NV_CUDA_LIB_VERSION} \
cuda-toolkit-11-8 \
${NV_LIBNPP_PACKAGE} \
cuda-nvtx-11-8=${NV_NVTX_VERSION} \
libcusparse-11-8=${NV_LIBCUSPARSE_VERSION} \
${NV_LIBCUBLAS_PACKAGE} \
${NV_LIBNCCL_PACKAGE} \
&& rm -rf /var/lib/apt/lists/*

# Keep apt from auto upgrading the cublas and nccl packages. See https://gitlab.com/nvidia/container-images/cuda/-/issues/88
RUN apt-mark hold ${NV_LIBCUBLAS_PACKAGE_NAME} ${NV_LIBNCCL_PACKAGE_NAME}

# Add entrypoint items
ENV NVIDIA_PRODUCT_NAME="CUDA"

###########################
### CudNN
###########################
# https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/runtime/cudnn8/Dockerfile

ENV NV_CUDNN_VERSION 8.6.0.163
ENV NV_CUDNN_PACKAGE_NAME "libcudnn8"

ENV NV_CUDNN_PACKAGE "libcudnn8=$NV_CUDNN_VERSION-1+cuda11.8"

ARG TARGETARCH

LABEL com.nvidia.cudnn.version="${NV_CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
${NV_CUDNN_PACKAGE} \
&& apt-mark hold ${NV_CUDNN_PACKAGE_NAME} \
&& rm -rf /var/lib/apt/lists/*

# Install PyTorch GPU Packages and enable PyTorch IPyKernel
RUN mamba create -n torch python=3.11 && \
mamba install -n torch --quiet --yes -c pytorch -c nvidia \
ipykernel \
pytorch \
torchvision \
torchaudio \
# gputil has nvidia-smi
gputil \
# pytorch-cuda are the nvidia cuda drivers
pytorch-cuda=11.8 \
&& \
conda clean --all -f -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER && \
source activate torch && \
python -m ipykernel install --user --name torch --display-name "PyTorch"

###############################
### docker-bits/2_pytorch.Dockerfile
###############################

#Install PyTorch
RUN conda create -n torch python=3.9 && \
conda install -n torch --quiet --yes -c pytorch \
'pytorch==1.13.1' \
'torchvision==0.14.1' \
'ipykernel==6.21.3' \
'torchtext==0.14.1' \
&& \
conda clean --all -f -y && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

###############################
### docker-bits/3_Kubeflow.Dockerfile
Expand Down
115 changes: 0 additions & 115 deletions output/jupyterlab-tensorflow/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,121 +30,6 @@ RUN pip install --force-reinstall cryptography==39.0.1 && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

###############################
### docker-bits/1_CUDA-11.8.0.Dockerfile
###############################

# Cuda stuff for v11.8.0

## https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/base/Dockerfile

###########################
### Base
###########################

ENV NVARCH x86_64

ENV NVIDIA_REQUIRE_CUDA "cuda>=11.8 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=510,driver<511 brand=unknown,driver>=510,driver<511 brand=nvidia,driver>=510,driver<511 brand=nvidiartx,driver>=510,driver<511 brand=geforce,driver>=510,driver<511 brand=geforcertx,driver>=510,driver<511 brand=quadro,driver>=510,driver<511 brand=quadrortx,driver>=510,driver<511 brand=titan,driver>=510,driver<511 brand=titanrtx,driver>=510,driver<511 brand=tesla,driver>=515,driver<516 brand=unknown,driver>=515,driver<516 brand=nvidia,driver>=515,driver<516 brand=nvidiartx,driver>=515,driver<516 brand=geforce,driver>=515,driver<516 brand=geforcertx,driver>=515,driver<516 brand=quadro,driver>=515,driver<516 brand=quadrortx,driver>=515,driver<516 brand=titan,driver>=515,driver<516 brand=titanrtx,driver>=515,driver<516"
ENV NV_CUDA_CUDART_VERSION 11.8.89-1
ENV NV_CUDA_COMPAT_PACKAGE cuda-compat-11-8
ENV OS_VER ubuntu2204

ARG TARGETARCH

RUN apt-get update && apt-get install -y --no-install-recommends \
gnupg2 curl ca-certificates && \
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/${OS_VER}/${NVARCH}/3bf863cc.pub | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/nvidia.gpg && \
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/${OS_VER}/${NVARCH} /" > /etc/apt/sources.list.d/cuda.list && \
apt-get purge --autoremove -y curl \
&& rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 11.8.0

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-cudart-11-8=${NV_CUDA_CUDART_VERSION} \
${NV_CUDA_COMPAT_PACKAGE} \
&& rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf \
&& echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV CUDA_DIR "/usr/local/cuda"
ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64:$CUDA_DIR/lib64
ENV XLA_FLAGS "--xla_gpu_cuda_data_dir=$CUDA_DIR"

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

# ###########################
# ### Devel
# ###########################
# # https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/devel/Dockerfile
#
# $(curl -s https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/devel/Dockerfile)

###########################
### Runtime
###########################
# https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/runtime/Dockerfile

ENV NV_CUDA_LIB_VERSION 11.8.0-1

ENV NV_NVTX_VERSION 11.8.86-1
ENV NV_LIBNPP_VERSION 11.8.0.86-1
ENV NV_LIBNPP_PACKAGE libnpp-11-8=${NV_LIBNPP_VERSION}
ENV NV_LIBCUSPARSE_VERSION 11.7.5.86-1

ENV NV_LIBCUBLAS_PACKAGE_NAME libcublas-11-8
ENV NV_LIBCUBLAS_VERSION 11.11.3.6-1
ENV NV_LIBCUBLAS_PACKAGE ${NV_LIBCUBLAS_PACKAGE_NAME}=${NV_LIBCUBLAS_VERSION}

ENV NV_LIBNCCL_PACKAGE_NAME libnccl2
ENV NV_LIBNCCL_PACKAGE_VERSION 2.15.5-1
ENV NCCL_VERSION 2.15.5-1
ENV NV_LIBNCCL_PACKAGE ${NV_LIBNCCL_PACKAGE_NAME}=${NV_LIBNCCL_PACKAGE_VERSION}+cuda11.8

ARG TARGETARCH

RUN apt-get update && apt-get install -y --no-install-recommends \
cuda-libraries-11-8=${NV_CUDA_LIB_VERSION} \
cuda-toolkit-11-8 \
${NV_LIBNPP_PACKAGE} \
cuda-nvtx-11-8=${NV_NVTX_VERSION} \
libcusparse-11-8=${NV_LIBCUSPARSE_VERSION} \
${NV_LIBCUBLAS_PACKAGE} \
${NV_LIBNCCL_PACKAGE} \
&& rm -rf /var/lib/apt/lists/*

# Keep apt from auto upgrading the cublas and nccl packages. See https://gitlab.com/nvidia/container-images/cuda/-/issues/88
RUN apt-mark hold ${NV_LIBCUBLAS_PACKAGE_NAME} ${NV_LIBNCCL_PACKAGE_NAME}

# Add entrypoint items
ENV NVIDIA_PRODUCT_NAME="CUDA"

###########################
### CudNN
###########################
# https://gitlab.com/nvidia/container-images/cuda/-/raw/ee72a6fef178d135e8366e5c88e15df39ff83c21/dist/11.8.0/ubuntu1804/runtime/cudnn8/Dockerfile

ENV NV_CUDNN_VERSION 8.6.0.163
ENV NV_CUDNN_PACKAGE_NAME "libcudnn8"

ENV NV_CUDNN_PACKAGE "libcudnn8=$NV_CUDNN_VERSION-1+cuda11.8"

ARG TARGETARCH

LABEL com.nvidia.cudnn.version="${NV_CUDNN_VERSION}"

RUN apt-get update && apt-get install -y --no-install-recommends \
${NV_CUDNN_PACKAGE} \
&& apt-mark hold ${NV_CUDNN_PACKAGE_NAME} \
&& rm -rf /var/lib/apt/lists/*


###############################
### docker-bits/2_tensorflow.Dockerfile
###############################
Expand Down