Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile and accompanying documentation #970

Merged
merged 20 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
3664934
Dockerfile and accompanying documentation
bhagemeier Apr 28, 2022
6dc85a8
README.md describing containerization
bhagemeier Oct 5, 2022
56211be
Fix indentation in README.md
bhagemeier Oct 26, 2022
aea60ec
Docker support
bhagemeier Oct 21, 2022
00025f9
Ensure mpi4py installation from source
bhagemeier Oct 27, 2022
a23ff2d
Migrate to NVidia PyTorch base image
bhagemeier Oct 28, 2022
7d870a7
Provide sample file for Singularity
bhagemeier Oct 28, 2022
351f9c1
feat: singularity definition file and slurm multi-node example in the…
JuanPedroGHM Jan 24, 2023
64b474f
docs: quick_start.md has a docker section with link to docker readme
JuanPedroGHM Mar 3, 2023
e7e0716
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 3, 2023
dace7c6
Merge branch 'main' into features/897-containerization
ClaudiaComito Apr 17, 2023
072d504
ci: docker cleanup
JuanPedroGHM Apr 24, 2023
5fbd116
ci: build docker action, updated docs
JuanPedroGHM Apr 25, 2023
3770196
Apply suggestions from code review
JuanPedroGHM May 8, 2023
97bdb42
README suggestions
JuanPedroGHM May 8, 2023
d238d17
Merge branch 'main' into features/897-containerization
ClaudiaComito May 22, 2023
0c4d6ba
docs: removed system specific flag from example slurm file
JuanPedroGHM May 30, 2023
3d9548f
Merge branch 'main' into features/897-containerization
ClaudiaComito May 31, 2023
4d73bad
Merge branch 'main' into features/897-containerization
ClaudiaComito Jun 7, 2023
2cd4785
Merge branch 'main' into features/897-containerization
ClaudiaComito Jun 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
name: 'Build and upload Docker img'
on:
workflow_dispatch:
inputs:
heat_version:
description: 'Heat version'
required: true
default: '1.2.2'
type: string
pytorch_img:
description: 'Base PyTorch Img'
required: true
default: '23.03-py3'
type: string
name:
description: 'Output Image name'
required: true
default: 'heat:1.2.2_torch1.13_cu12.1'
type: string
jobs:
build-and-push-img:
runs-on: ubuntu-latest
steps:
-
name: Checkout
uses: actions/checkout@v3
-
name: Set up QEMU
uses: docker/setup-qemu-action@v2
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
driver: docker
-
name: Login to GitHub Container Registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.repository_owner }}
password: ${{ secrets.GITHUB_TOKEN }}
-
name: Build
uses: docker/build-push-action@v4
with:
context: docker/
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
load: true
tags: |
test_${{ inputs.name }}
-
name: Test
run: |
docker images
docker run -v `pwd`:`pwd` -w `pwd` --rm test_${{ inputs.name }} pytest
-
name: Build and push
uses: docker/build-push-action@v4
with:
context: docker/
build-args: |
HEAT_VERSION=${{ inputs.heat_version }}
PYTORCH_IMG=${{ inputs.pytorch_img}}
push: true
tags: |
ghcr.io/helmholtz-analytics/${{ inputs.name }}
21 changes: 21 additions & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
ARG PACKAGE_NAME=heat
ARG HEAT_VERSION=1.2.2
ARG PYTORCH_IMG=22.05-py3
ARG HEAT_BRANCH=main
ARG INSTALL_TYPE=release

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

JuanPedroGHM marked this conversation as resolved.
Show resolved Hide resolved
FROM base AS source-install
ARG HEAT_BRANCH
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git ; cd heat; pip install mpi4py --no-binary :all: ; pip install .[hdf5,netcdf]; pip cache purge ; cd ..; rm -rf heat

FROM base AS release-install
ARG PACKAGE_NAME
ARG HEAT_VERSION
RUN pip install mpi4py --no-binary :all: ; if [ "x${HEAT_VERSION}" = "x" ]; then pip install ${PACKAGE_NAME}[hdf5,netcdf]; else pip install ${PACKAGE_NAME}[hdf5,netcdf]==${HEAT_VERSION}; fi ; pip cache purge ; true

FROM ${INSTALL_TYPE}-install AS final
90 changes: 90 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Docker images of Heat

There is some flexibility to building the Docker images of Heat.

Firstly, one can build from the released version taken from PyPI. This will either be
the latest release or the version set through the `--build-arg=HEAT_VERSION=1.2.0`
argument.

Secondly one can build a docker image from the GitHub sources, selected through
`--build-arg=INSTALL_TYPE=source`. The default branch to be built is main, other
branches can be specified using `--build-arg=HEAT_BRANCH=branchname`.

## General build

### Docker

The [Dockerfile](./Dockerfile) guiding the build of the Docker image is located in this
directory. It is typically most convenient to `cd` over here and run the Docker build as:

```console
$ docker build --build-args HEAT_VERSION=1.2.2 --PYTORCH_IMG=22.05-py3 -t heat:local .
```

We also offer prebuilt images in our [Package registry](https://github.com/helmholtz-analytics/heat/pkgs/container/heat) from which you can pull existing images:


```console
$ docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
```

### Building for HPC

With Heat being a native HPC library, one would naturally want to build the container
image also for HPC systems, such as the ones available at [Jülich Supercomputing Centre
(JSC)](https://www.fz-juelich.de/jsc/ "Juelich Supercomputing Centre"). We show two ways to convert the existing images from the registry into singularity containers.

#### Apptainer (formerly singularity)

To use one of the existing images from our registry:

$ apptainer build heat.sif docker://ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8

Building the image can require root access in some systems. If that is the case, we recomend build the image on a local machine, and then upload it to the desired HPC system.

If you see an error indicating that there is not enough space, use the --tmpdir flag of the build command. [Apptainer docs](https://apptainer.org/docs/user/latest/build_a_container.html)

#### SIB (Singularity Image Builder)

A simple `Dockerfile` (in addition to the one above) to be used with SIB could look like
this:

FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.12_cuda11.7_py3.8

The invocation to build the image would be:

$ sib upload ./Dockerfile heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib build --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8
$ sib download --recipe-name heat_1.2.0_torch1.12_cuda11.7_py3.8

However, SIB is capable of using just about any available Docker image from any
registry, such that a specific Singularity image can be built by simply referencing the
available image. SIB is thus used as a conversion tool.

## Running on HPC

$ singularity run --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif /bin/bash
JuanPedroGHM marked this conversation as resolved.
Show resolved Hide resolved
$ python
Python 3.8.13 (default, Mar 28 2022, 11:38:47)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import heat as ht
...

The `--nv` argument to `singularity`enables NVidia GPU support, which is desired for
Heat.

### Multi-node example

The following file can be used as an example to use the singularity file together with SLURM, which allows heat to work in a multi-node environment.

```bash
#!/bin/bash
#SBATCH --time 0:10:00
#SBATCH --nodes 2
#SBATCH --tasks-per-node 2

...

srun --mpi="pmi2" singularity exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bash -c "cd ~/code/heat/examples/lasso; python demo.py"
```
2 changes: 2 additions & 0 deletions docker/singularity-dockerfile.sample
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# This is a sample file to use with the Singularity image builder
FROM ghcr.io/helmholtz-analytics/heat:1.2.0_torch1.11_cuda11.5_py3.9
2 changes: 2 additions & 0 deletions docker/tzdata.seed
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tzdata tzdata/Areas select Europe
tzdata tzdata/Zones/Europe select Berlin
21 changes: 17 additions & 4 deletions quick_start.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,24 @@ pip install heat[hdf5,netcdf]
```
[Test](#test) your installation.

### HPC
Work in progress...

### Docker
Work in progress ([PR 970](https://github.com/helmholtz-analytics/heat/pull/970))

Get the docker image from our package repository

```
docker pull ghcr.io/helmholtz-analytics/heat:1.2.0-dev_torch1.12_cuda11.7_py3.8
```

or build it from our Dockerfile

```
git clone https://github.com/helmholtz-analytics/heat.git
cd heat/docker
docker build -t heat:latest .
```

See [our docker README](https://github.com/helmholtz-analytics/heat/tree/main/docker/README.md) for other details.


### Test
In your terminal, test your setup with the [`heat_test.py`](https://github.com/helmholtz-analytics/heat/blob/main/scripts/heat_test.py) script:
Expand Down