Skip to content

Commit

Permalink
Dockerfile and accompanying documentation (#970)
Browse files Browse the repository at this point in the history
* Dockerfile and accompanying documentation

The Dockerfile provides some flexibility in selecting which version of HeAT should be inside
the Docker image. Also, one can choose whether to install from source or from PyPI.

* README.md describing containerization

* Fix indentation in README.md

Some code sections had a mix of spaces and tabs, which have now been
convertd into tabs.

* Docker support

Use pytorch 1.11
Fix problem with CUDA package repo keys

* Ensure mpi4py installation from source

* Migrate to NVidia PyTorch base image

NVidia images come with support for HPC systems desirable for our uses.
They work a little differently internally and required some changes.

The tzdata configuration configures the CET/CEST timezone, which seems
to be required when installing additional packages.

There is an issue with pip caches in the image, which led to the final
cache purge to fail in the PyPI release based build. This is fixed
through a final invocation of true.

* Provide sample file for Singularity

* feat: singularity definition file and slurm multi-node example in the docker readme

* docs: quick_start.md has a docker section with link to docker readme

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ci: docker cleanup

* ci: build docker action, updated docs

* Apply suggestions from code review

Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>

* README suggestions

* docs: removed system specific flag from example slurm file

---------

Co-authored-by: Gutiérrez Hermosillo Muriedas, Juan Pedro <juanpedroghm@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Claudia Comito <39374113+ClaudiaComito@users.noreply.github.com>
  • Loading branch information
4 people authored Jun 20, 2023
1 parent 9ea256b commit 966a7a8
Show file tree
Hide file tree
Showing 6 changed files with 200 additions and 4 deletions.
68 changes: 68 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: 'Build and upload Docker img'
on:
workflow_dispatch:
inputs:
heat_version:
description: 'Heat version'
required: true
default: '1.2.2'
type: string
pytorch_img:
description: 'Base PyTorch Img'
required: true
default: '23.03-py3'
type: string
name:
description: 'Output Image name'
required: true
default: 'heat:1.2.2_torch1.13_cu12.1'
type: string
jobs:
build-and-push-img:
runs-on: ubuntu-latest
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
driver: docker
-
name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Build
uses: docker/build-push-action@v4
with:
context: docker/
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
load: true
tags: |
test_${{ inputs.name }}
-
name: Test
run: |
docker images
docker run -v `pwd`:`pwd` -w `pwd` --rm test_${{ inputs.name }} pytest
-
name: Build and push
uses: docker/build-push-action@v4
with:
context: docker/
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
push: true
tags: |
ghcr.io/helmholtz-analytics/${{ inputs.name }}
21 changes: 21 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
ARG PACKAGE_NAME=heat
ARG HEAT_VERSION=1.2.2
ARG PYTORCH_IMG=22.05-py3
ARG HEAT_BRANCH=main
ARG INSTALL_TYPE=release

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS source-install
ARG HEAT_BRANCH
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git ; cd heat; pip install mpi4py --no-binary :all: ; pip install .[hdf5,netcdf]; pip cache purge ; cd ..; rm -rf heat

FROM base AS release-install
ARG PACKAGE_NAME
ARG HEAT_VERSION
RUN pip install mpi4py --no-binary :all: ; if [ "x${HEAT_VERSION}" = "x" ]; then pip install ${PACKAGE_NAME}[hdf5,netcdf]; else pip install ${PACKAGE_NAME}[hdf5,netcdf]==${HEAT_VERSION}; fi ; pip cache purge ; true

FROM ${INSTALL_TYPE}-install AS final
90 changes: 90 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Docker images of Heat

There is some flexibility to building the Docker images of Heat.

Firstly, one can build from the released version taken from PyPI. This will either be
the latest release or the version set through the `--build-arg=HEAT_VERSION=1.2.0`
argument.

Secondly one can build a docker image from the GitHub sources, selected through
`--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other
branches can be specified using `--build-arg=HEAT_BRANCH=branchname`.

## General build

### Docker

The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located in this
directory. It is typically most convenient to `cd` over here and run the Docker build as:

```console
$ docker build --build-args HEAT_VERSION=1.2.2 --PYTORCH_IMG=22.05-py3 -t heat:local .
```

We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images:


```console
$ docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
```

### Building for HPC

With Heat being a native HPC library, one would naturally want to build the container
image also for HPC systems, such as the ones available at [Jülich Supercomputing Centre
(JSC)](https://www.fz-juelich.de/jsc/ "Juelich Supercomputing Centre"). We show two ways to convert the existing images from the registry into singularity containers.

#### Apptainer (formerly singularity)

To use one of the existing images from our registry:

$ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8

Building the image can require root access in some systems. If that is the case, we recomend build the image on a local machine, and then upload it to the desired HPC system.

If you see an error indicating that there is not enough space, use the --tmpdir flag of the build command. [Apptainer docs](https://apptainer.org/docs/user/latest/build_a_container.html)

#### SIB (Singularity Image Builder)

A simple `Dockerfile` (in addition to the one above) to be used with SIB could look like
this:

FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.12_cuda11.7_py3.8

The invocation to build the image would be:

$ sib upload ./Dockerfile heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib build --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib download --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8

However, SIB is capable of using just about any available Docker image from any
registry, such that a specific Singularity image can be built by simply referencing the
available image. SIB is thus used as a conversion tool.

## Running on HPC

$ singularity run --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif /bin/bash
$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import heat as ht
...

The `--nv` argument to `singularity`enables NVidia GPU support, which is desired for
Heat.

### Multi-node example

The following file can be used as an example to use the singularity file together with SLURM, which allows heat to work in a multi-node environment.

```bash
#!/bin/bash
#SBATCH --time 0:10:00
#SBATCH --nodes 2
#SBATCH --tasks-per-node 2

...

srun --mpi="pmi2" singularity exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py"
```
2 changes: 2 additions & 0 deletions docker/singularity-dockerfile.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# This is a sample file to use with the Singularity image builder
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.11_cuda11.5_py3.9
2 changes: 2 additions & 0 deletions docker/tzdata.seed
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tzdata tzdata/Areas select Europe
tzdata tzdata/Zones/Europe select Berlin
21 changes: 17 additions & 4 deletions quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,24 @@ pip install heat[hdf5,netcdf]
```
[Test](#test) your installation.

### HPC
Work in progress...

### Docker
Work in progress ([PR 970](https://github.com/helmholtz-analytics/heat/pull/970))

Get the docker image from our package repository

```
docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
```

or build it from our Dockerfile

```
git clone https://github.com/helmholtz-analytics/heat.git
cd heat/docker
docker build -t heat:latest .
```

See [our docker README](https://github.com/helmholtz-analytics/heat/tree/main/docker/README.md) for other details.


### Test
In your terminal, test your setup with the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script:
Expand Down

0 comments on commit 966a7a8

Please sign in to comment.