Skip to content

Commit

Permalink
Add Dockerfiles for Neuron DLC with SDK 2.20.1 (#24)
Browse files Browse the repository at this point in the history
*Issue #, if available:*

*Description of changes:*
Add Dockerfiles for Neuron DLC with SDK 2.20.1

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Fu Qiao <qiaofu@amazon.com>
  • Loading branch information
phooq and Fu Qiao authored Oct 30, 2024
1 parent a98d3e8 commit 1066b48
Show file tree
Hide file tree
Showing 11 changed files with 404 additions and 1,084 deletions.
1 change: 1 addition & 0 deletions docker/pytorch/inference/1.13.1/Dockerfile.neuron
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ RUN apt-get update \
unzip \
zlib1g-dev \
libcap-dev \
gnupg2 \
gpg-agent \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/tmp* \
Expand Down
256 changes: 21 additions & 235 deletions docker/pytorch/inference/1.13.1/Dockerfile.neuron.cve_allowlist.json

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion docker/pytorch/inference/1.13.1/Dockerfile.neuronx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
# Neuron SDK components version numbers
ARG NEURONX_FRAMEWORK_VERSION=1.13.1.1.16.0
ARG NEURONX_DISTRIBUTED_VERSION=0.9.0
ARG NEURONX_CC_VERSION=2.15.128.0
ARG NEURONX_CC_VERSION=2.15.141.0
ARG NEURONX_TRANSFORMERS_VERSION=0.12.313
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.22.26.0-17a033bc8
ARG NEURONX_RUNTIME_LIB_VERSION=2.22.14.0-6e27b8d5b
Expand Down Expand Up @@ -51,6 +51,7 @@ RUN apt-get update \
unzip \
zlib1g-dev \
libcap-dev \
gnupg2 \
gpg-agent \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/tmp* \
Expand Down
237 changes: 24 additions & 213 deletions docker/pytorch/inference/1.13.1/Dockerfile.neuronx.cve_allowlist.json

Large diffs are not rendered by default.

5 changes: 3 additions & 2 deletions docker/pytorch/inference/2.1.2/Dockerfile.neuronx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true

# Neuron SDK components version numbers
ARG NEURONX_DISTRIBUTED_VERSION=0.9.0
ARG NEURONX_CC_VERSION=2.15.128.0
ARG NEURONX_FRAMEWORK_VERSION=2.1.2.2.3.0
ARG NEURONX_CC_VERSION=2.15.141.0
ARG NEURONX_FRAMEWORK_VERSION=2.1.2.2.3.1
ARG NEURONX_TRANSFORMERS_VERSION=0.12.313
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.22.26.0-17a033bc8
ARG NEURONX_RUNTIME_LIB_VERSION=2.22.14.0-6e27b8d5b
Expand Down Expand Up @@ -51,6 +51,7 @@ RUN apt-get update \
unzip \
zlib1g-dev \
libcap-dev \
gnupg2 \
gpg-agent \
&& rm -rf /var/lib/apt/lists/* \
&& rm -rf /tmp/tmp* \
Expand Down
237 changes: 24 additions & 213 deletions docker/pytorch/inference/2.1.2/Dockerfile.neuronx.cve_allowlist.json

Large diffs are not rendered by default.

33 changes: 29 additions & 4 deletions docker/pytorch/training/1.13.1/Dockerfile.neuronx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ LABEL dlc_major_version="1"
ARG NEURONX_FRAMEWORK_VERSION=1.13.1.1.16.0
ARG NEURONX_DISTRIBUTED_VERSION=0.9.0
ARG NEURONX_DISTRIBUTED_TRAINING_VERSION=1.0.0
ARG NEURONX_CC_VERSION=2.15.128.0
ARG NEURONX_CC_VERSION=2.15.141.0
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.22.26.0-17a033bc8
ARG NEURONX_RUNTIME_LIB_VERSION=2.22.14.0-6e27b8d5b
ARG NEURONX_TOOLS_VERSION=2.19.0.0
Expand Down Expand Up @@ -142,9 +142,34 @@ RUN ${PIP} install --no-cache-dir -U \
RUN mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
RUN ${PIP} config set global.extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall torch-neuronx==$NEURONX_FRAMEWORK_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall neuronx-cc==$NEURONX_CC_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall --no-deps neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall --no-deps neuronx_distributed_training==$NEURONX_DISTRIBUTED_TRAINING_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com
&& ${PIP} install --force-reinstall neuronx-cc==$NEURONX_CC_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

RUN ${PIP} install --force-reinstall --no-deps neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

## Installation for Neuronx Distributed Training framework
# Install Cython
RUN pip install --no-cache-dir Cython

# Copy the apex_setup.py file
COPY apex_setup.py /root/apex_setup.py

# Clone and build Apex
RUN git clone https://github.com/NVIDIA/apex.git /root/apex \
&& cd /root/apex \
&& git checkout 23.05 \
&& cp /root/apex_setup.py setup.py \
&& python3 setup.py bdist_wheel

#Install dependencies from requirements and extras for SageMaker usecase
RUN wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/requirements.txt \
&& pip install --no-deps --no-cache-dir --no-build-isolation -r requirements.txt /root/apex/dist/apex-0.1-py3-none-any.whl \
&& pip install --force-reinstall "numba==0.57.1" \
"multiprocess==0.70.16" \
"numpy>=1.24.3,<=1.25.2" \
"dill==0.3.8"


RUN ${PIP} install --force-reinstall --no-deps neuronx_distributed_training==$NEURONX_DISTRIBUTED_TRAINING_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

# attrs, neuronx-cc required: >=19.2.0, sagemaker <24,>=23.1.0
# protobuf neuronx-cc<4, sagemaker-training >=3.9.2,<=3.20.3
Expand Down
330 changes: 125 additions & 205 deletions docker/pytorch/training/1.13.1/Dockerfile.neuronx.cve_allowlist.json

Large diffs are not rendered by default.

36 changes: 30 additions & 6 deletions docker/pytorch/training/2.1.2/Dockerfile.neuronx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ LABEL dlc_major_version="1"
# Neuron SDK components version numbers
ARG NEURONX_DISTRIBUTED_VERSION=0.9.0
ARG NEURONX_DISTRIBUTED_TRAINING_VERSION=1.0.0
ARG NEURONX_CC_VERSION=2.15.128.0
ARG NEURONX_FRAMEWORK_VERSION=2.1.2.2.3.0
ARG NEURONX_CC_VERSION=2.15.141.0
ARG NEURONX_FRAMEWORK_VERSION=2.1.2.2.3.1
ARG NEURONX_COLLECTIVES_LIB_VERSION=2.22.26.0-17a033bc8
ARG NEURONX_RUNTIME_LIB_VERSION=2.22.14.0-6e27b8d5b
ARG NEURONX_TOOLS_VERSION=2.19.0.0
Expand Down Expand Up @@ -139,12 +139,36 @@ RUN ${PIP} install --no-cache-dir -U \
transformers==4.36.2 \
Pillow

RUN mkdir -p /etc/pki/tls/certs && cp /etc/ssl/certs/ca-certificates.crt /etc/pki/tls/certs/ca-bundle.crt
RUN ${PIP} config set global.extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall torch-neuronx==$NEURONX_FRAMEWORK_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall neuronx-cc==$NEURONX_CC_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall --no-deps neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com \
&& ${PIP} install --force-reinstall --no-deps neuronx_distributed_training==$NEURONX_DISTRIBUTED_TRAINING_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com
&& ${PIP} install --force-reinstall neuronx-cc==$NEURONX_CC_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

RUN ${PIP} install --force-reinstall --no-deps neuronx_distributed==$NEURONX_DISTRIBUTED_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

## Installation for Neuronx Distributed Training framework
# Install Cython
RUN pip install --no-cache-dir Cython

# Copy the apex_setup.py file
COPY apex_setup.py /root/apex_setup.py

# Clone and build Apex
RUN git clone https://github.com/NVIDIA/apex.git /root/apex \
&& cd /root/apex \
&& git checkout 23.05 \
&& cp /root/apex_setup.py setup.py \
&& python3 setup.py bdist_wheel

#Install dependencies from requirements and extras for SageMaker usecase
RUN wget https://raw.githubusercontent.com/aws-neuron/neuronx-distributed-training/master/requirements.txt \
&& pip install --no-deps --no-cache-dir --no-build-isolation -r requirements.txt /root/apex/dist/apex-0.1-py3-none-any.whl \
&& pip install --force-reinstall "numba==0.57.1" \
"multiprocess==0.70.16" \
"numpy>=1.24.3,<=1.25.2" \
"dill==0.3.8"


RUN ${PIP} install --force-reinstall --no-deps neuronx_distributed_training==$NEURONX_DISTRIBUTED_TRAINING_VERSION --extra-index-url https://pip.repos.neuron.amazonaws.com

# attrs, neuronx-cc required: >=19.2.0, sagemaker <24,>=23.1.0
# protobuf neuronx-cc<4, sagemaker-training >=3.9.2,<=3.20.3
Expand Down
Loading

0 comments on commit 1066b48

Please sign in to comment.