Repository for automating runs of operational disease forecasting models.
This project supports containerizing its models via reusable Dockerfile and run.sh files. This works by passing various environment variables to docker build
and docker run
commands as documented below. The basic steps for containerizing a new model are:
- Create a subfolder for your model. This is called the
MODEL_DIR
. An example isflu_ar2
. - Add a
README.md
and your executable files (e.g., .R and .py files) to that folder (do not use subfolders). - Generate
requirements.txt
andrenv.lock
files as documented below. - Build and run the image as documented below. You will likely want to create a .env file for running the image (see
--env-file
at https://docs.docker.com/reference/cli/docker/container/run/#env ). NB: Do not use double quotes around variable values - see Handle quotes in --env-file values consistently with Linux/WSL2 "source"ing #3630.
Environment variables: Building the Dockerfile for a particular model uses the following environment variables:
- (required)
MODEL_DIR
: specifies the directory name (not full path) of the model being built. Example:MODEL_DIR=flu_ar2
.
Example build command:
cd "path-to-this-repo"
docker build --build-arg MODEL_DIR=flu_ar2 --tag=flu_ar2:1.0 --file=Dockerfile .
Environment variables: There are two sources of environment variables used by this repo's containerization approach:
- We use reichlab/container-utils to manage variables for GitHub credentials and Slack integration (messages and uploads). It requires the following variables (please see the repo's README.md for details):
SLACK_API_TOKEN
,CHANNEL_ID
(required): used by slack.shGH_TOKEN
,GIT_USER_NAME
,GIT_USER_EMAIL
,GIT_CREDENTIALS
(required): used by load-env-vars.shDRY_RUN
(optional): when set (to anything), stops git commit actions from happening (default is to do commits).
- This repo's run.sh is parameterized to work with this repo's different models, so running the Dockerfile for a particular model uses the following environment variables. These can be passed via docker run's
--env
or--env-file
args.MODEL_NAME
(required): Hub name of the model (i.e., the name used in model outputs). Example:MODEL_NAME=UMass-AR2
REPO_NAME
(required): Name of the repository being cloned. Example:REPO_NAME=FluSight-forecast-hub
REPO_URL
(required): Full URL of the repository being cloned, excluding ".git". Example:REPO_URL=https://github.com/reichlab/FluSight-forecast-hub
REPO_UPSTREAM_URL
(required): Full URL of the repository thatREPO_URL
was forked from, excluding ".git". Example:REPO_UPSTREAM_URL=https://github.com/cdcepi/FluSight-forecast-hub
MAIN_PY_ARGS
(optional): Specifies arguments that are passed through to run.sh's call to the particular model'smain.py
. Note that these arguments are model-specific. For example, the flu_flusion model accepts two args:MAIN_PY_ARGS=--today_date=2024-11-27 --short_run=True
whereas theflu_ar2
model accepts only the former arg.
Example run command:
docker run --rm \
--env-file path_to_env_file/git-and-slack-credentials.env \
--env MODEL_NAME="UMass-AR2" \
--env REPO_NAME="FluSight-forecast-hub" \
... \
--env DRY_RUN=1 \
flu_ar2:1.0
Use the following commands to build and push an image. These use the flu_ar2
model as an example.
Note: We build for the
amd64
architecture because that's what most Linux-based servers (including AWS) use natively. This is as opposed to Apple Silicon Macs, which have anarm64
architecture. Note: For Macs with Apple silicon chips as of this writing, specifying--platform=linux/amd64
causes the build to fail unless you disable Rosetta in Docker Desktop. For details, see Buildx throws Illegal Instruction installing ca-certificates when building for linux/amd64 on M2 #7255.
cd "path-to-this-repo"
docker login -u "reichlab" docker.io
docker build --platform=linux/amd64 --build-arg MODEL_DIR=flu_ar2 --tag=reichlab/flu_ar2:1.0 --file=Dockerfile .
docker push reichlab/flu_ar2:1.0
Each model has different R and Python library requirements. These are captured via Python requirements.txt and renv renv.lock
files that are stored in each model's subdirectory. Following is how to create these.
Generating this file is somewhat Python tooling-specific. For example, pipenv uses pipenv requirements > requirements.txt
.
A renv.lock
file is generated via the following steps. As noted above, the "install required R libraries via CRAN" step will vary depending on the individual model's needs. Below we show the commands for the flu_ar2
model, but you will need to change them for yours.
- start a fresh temporary rocker/r-ver:4.3.2 container via
docker run --rm -it --name temp_container rocker/r-ver:4.3.2 /bin/bash
- install the required OS libraries and applications (see "install general OS utilities" and "install OS binaries required by R packages" in the Dockerfile)
- install renv via
Rscript -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))"
- create a project directory via
mkdir proj ; cd proj
- initialize renv via
Rscript -e "renv::init(bare = TRUE)"
- install required R libraries. NB: these will vary depending on the model:
Rscript -e "renv::install(c('lubridate', 'readr', 'remotes'))" Rscript -e "renv::install('arrow', repos = c('https://apache.r-universe.dev', 'https://cran.r-project.org'))" Rscript -e "renv::install('reichlab/zoltr')" Rscript -e "renv::install('hubverse-org/hubData')" Rscript -e "renv::install('hubverse-org/hubVis')"
- create
renv.lock
from within the R interpreter (this fails in bash) viarenv::settings$snapshot.type('all') ; renv::snapshot()
- copy the new
/proj/renv.lock
file out from the container