-
Notifications
You must be signed in to change notification settings - Fork 54
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Dockerfile and accompanying documentation (#970)
* Dockerfile and accompanying documentation The Dockerfile provides some flexibility in selecting which version of HeAT should be inside the Docker image. Also, one can choose whether to install from source or from PyPI. * README.md describing containerization * Fix indentation in README.md Some code sections had a mix of spaces and tabs, which have now been convertd into tabs. * Docker support Use pytorch 1.11 Fix problem with CUDA package repo keys * Ensure mpi4py installation from source * Migrate to NVidia PyTorch base image NVidia images come with support for HPC systems desirable for our uses. They work a little differently internally and required some changes. The tzdata configuration configures the CET/CEST timezone, which seems to be required when installing additional packages. There is an issue with pip caches in the image, which led to the final cache purge to fail in the PyPI release based build. This is fixed through a final invocation of true. * Provide sample file for Singularity * feat: singularity definition file and slurm multi-node example in the docker readme * docs: quick_start.md has a docker section with link to docker readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ci: docker cleanup * ci: build docker action, updated docs * Apply suggestions from code review Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com> * README suggestions * docs: removed system specific flag from example slurm file --------- Co-authored-by: Gutiérrez Hermosillo Muriedas, Juan Pedro <juanpedroghm@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
- Loading branch information
1 parent
9ea256b
commit 966a7a8
Showing
6 changed files
with
200 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
name: 'Build and upload Docker img' | ||
on: | ||
workflow_dispatch: | ||
inputs: | ||
heat_version: | ||
description: 'Heat version' | ||
required: true | ||
default: '1.2.2' | ||
type: string | ||
pytorch_img: | ||
description: 'Base PyTorch Img' | ||
required: true | ||
default: '23.03-py3' | ||
type: string | ||
name: | ||
description: 'Output Image name' | ||
required: true | ||
default: 'heat:1.2.2_torch1.13_cu12.1' | ||
type: string | ||
jobs: | ||
build-and-push-img: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- | ||
name: Checkout | ||
uses: actions/checkout@v3 | ||
- | ||
name: Set up QEMU | ||
uses: docker/setup-qemu-action@v2 | ||
- | ||
name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
with: | ||
driver: docker | ||
- | ||
name: Login to GitHub Container Registry | ||
uses: docker/login-action@v2 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.repository_owner }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
- | ||
name: Build | ||
uses: docker/build-push-action@v4 | ||
with: | ||
context: docker/ | ||
build-args: | | ||
HEAT_VERSION=${{ inputs.heat_version }} | ||
PYTORCH_IMG=${{ inputs.pytorch_img}} | ||
load: true | ||
tags: | | ||
test_${{ inputs.name }} | ||
- | ||
name: Test | ||
run: | | ||
docker images | ||
docker run -v `pwd`:`pwd` -w `pwd` --rm test_${{ inputs.name }} pytest | ||
- | ||
name: Build and push | ||
uses: docker/build-push-action@v4 | ||
with: | ||
context: docker/ | ||
build-args: | | ||
HEAT_VERSION=${{ inputs.heat_version }} | ||
PYTORCH_IMG=${{ inputs.pytorch_img}} | ||
push: true | ||
tags: | | ||
ghcr.io/helmholtz-analytics/${{ inputs.name }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
ARG PACKAGE_NAME=heat | ||
ARG HEAT_VERSION=1.2.2 | ||
ARG PYTORCH_IMG=22.05-py3 | ||
ARG HEAT_BRANCH=main | ||
ARG INSTALL_TYPE=release | ||
|
||
FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base | ||
COPY ./tzdata.seed /tmp/tzdata.seed | ||
RUN debconf-set-selections /tmp/tzdata.seed | ||
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/* | ||
|
||
FROM base AS source-install | ||
ARG HEAT_BRANCH | ||
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git ; cd heat; pip install mpi4py --no-binary :all: ; pip install .[hdf5,netcdf]; pip cache purge ; cd ..; rm -rf heat | ||
|
||
FROM base AS release-install | ||
ARG PACKAGE_NAME | ||
ARG HEAT_VERSION | ||
RUN pip install mpi4py --no-binary :all: ; if [ "x${HEAT_VERSION}" = "x" ]; then pip install ${PACKAGE_NAME}[hdf5,netcdf]; else pip install ${PACKAGE_NAME}[hdf5,netcdf]==${HEAT_VERSION}; fi ; pip cache purge ; true | ||
|
||
FROM ${INSTALL_TYPE}-install AS final |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Docker images of Heat | ||
|
||
There is some flexibility to building the Docker images of Heat. | ||
|
||
Firstly, one can build from the released version taken from PyPI. This will either be | ||
the latest release or the version set through the `--build-arg=HEAT_VERSION=1.2.0` | ||
argument. | ||
|
||
Secondly one can build a docker image from the GitHub sources, selected through | ||
`--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other | ||
branches can be specified using `--build-arg=HEAT_BRANCH=branchname`. | ||
|
||
## General build | ||
|
||
### Docker | ||
|
||
The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located in this | ||
directory. It is typically most convenient to `cd` over here and run the Docker build as: | ||
|
||
```console | ||
$ docker build --build-args HEAT_VERSION=1.2.2 --PYTORCH_IMG=22.05-py3 -t heat:local . | ||
``` | ||
|
||
We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images: | ||
|
||
|
||
```console | ||
$ docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8 | ||
``` | ||
|
||
### Building for HPC | ||
|
||
With Heat being a native HPC library, one would naturally want to build the container | ||
image also for HPC systems, such as the ones available at [Jülich Supercomputing Centre | ||
(JSC)](https://www.fz-juelich.de/jsc/ "Juelich Supercomputing Centre"). We show two ways to convert the existing images from the registry into singularity containers. | ||
|
||
#### Apptainer (formerly singularity) | ||
|
||
To use one of the existing images from our registry: | ||
|
||
$ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8 | ||
|
||
Building the image can require root access in some systems. If that is the case, we recomend build the image on a local machine, and then upload it to the desired HPC system. | ||
|
||
If you see an error indicating that there is not enough space, use the --tmpdir flag of the build command. [Apptainer docs](https://apptainer.org/docs/user/latest/build_a_container.html) | ||
|
||
#### SIB (Singularity Image Builder) | ||
|
||
A simple `Dockerfile` (in addition to the one above) to be used with SIB could look like | ||
this: | ||
|
||
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.12_cuda11.7_py3.8 | ||
|
||
The invocation to build the image would be: | ||
|
||
$ sib upload ./Dockerfile heat_1.2.0_torch1.12_cuda11.7_py3.8 | ||
$ sib build --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8 | ||
$ sib download --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8 | ||
|
||
However, SIB is capable of using just about any available Docker image from any | ||
registry, such that a specific Singularity image can be built by simply referencing the | ||
available image. SIB is thus used as a conversion tool. | ||
|
||
## Running on HPC | ||
|
||
$ singularity run --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif /bin/bash | ||
$ python | ||
Python 3.8.13 (default, Mar 28 2022, 11:38:47) | ||
[GCC 7.5.0] :: Anaconda, Inc. on linux | ||
Type "help", "copyright", "credits" or "license" for more information. | ||
>>> import heat as ht | ||
... | ||
|
||
The `--nv` argument to `singularity`enables NVidia GPU support, which is desired for | ||
Heat. | ||
|
||
### Multi-node example | ||
|
||
The following file can be used as an example to use the singularity file together with SLURM, which allows heat to work in a multi-node environment. | ||
|
||
```bash | ||
#!/bin/bash | ||
#SBATCH --time 0:10:00 | ||
#SBATCH --nodes 2 | ||
#SBATCH --tasks-per-node 2 | ||
|
||
... | ||
|
||
srun --mpi="pmi2" singularity exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# This is a sample file to use with the Singularity image builder | ||
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.11_cuda11.5_py3.9 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
tzdata tzdata/Areas select Europe | ||
tzdata tzdata/Zones/Europe select Berlin |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters