Skip to content

Repository for automating runs of operational disease forecasting models.

License

Notifications You must be signed in to change notification settings

reichlab/operational-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

operational-models

Repository for automating runs of operational disease forecasting models.

Docker instructions

This project supports containerizing its models via reusable Dockerfile and run.sh files. This works by passing various environment variables to docker build and docker run commands as documented below. The basic steps for containerizing a new model are:

To build the image

Environment variables: Building the Dockerfile for a particular model uses the following environment variables:

  • (required) MODEL_DIR: specifies the directory name (not full path) of the model being built. Example: MODEL_DIR=flu_ar2.

Example build command:

cd "path-to-this-repo"
docker build --build-arg MODEL_DIR=flu_ar2 --tag=flu_ar2:1.0 --file=Dockerfile .

To run the image locally

Environment variables: There are two sources of environment variables used by this repo's containerization approach:

  1. We use reichlab/container-utils to manage variables for GitHub credentials and Slack integration (messages and uploads). It requires the following variables (please see the repo's README.md for details):
    • SLACK_API_TOKEN, CHANNEL_ID (required): used by slack.sh
    • GH_TOKEN, GIT_USER_NAME, GIT_USER_EMAIL, GIT_CREDENTIALS (required): used by load-env-vars.sh
    • DRY_RUN (optional): when set (to anything), stops git commit actions from happening (default is to do commits).
  2. This repo's run.sh is parameterized to work with this repo's different models, so running the Dockerfile for a particular model uses the following environment variables. These can be passed via docker run's --env or --env-file args.
    • MODEL_NAME (required): Hub name of the model (i.e., the name used in model outputs). Example: MODEL_NAME=UMass-AR2
    • REPO_NAME (required): Name of the repository being cloned. Example: REPO_NAME=FluSight-forecast-hub
    • REPO_URL (required): Full URL of the repository being cloned, excluding ".git". Example: REPO_URL=https://github.com/reichlab/FluSight-forecast-hub
    • REPO_UPSTREAM_URL (required): Full URL of the repository that REPO_URL was forked from, excluding ".git". Example: REPO_UPSTREAM_URL=https://github.com/cdcepi/FluSight-forecast-hub
    • MAIN_PY_ARGS (optional): Specifies arguments that are passed through to run.sh's call to the particular model's main.py. Note that these arguments are model-specific. For example, the flu_flusion model accepts two args: MAIN_PY_ARGS=--today_date=2024-11-27 --short_run=True whereas the flu_ar2 model accepts only the former arg.

Example run command:

docker run --rm \
  --env-file path_to_env_file/git-and-slack-credentials.env \
  --env MODEL_NAME="UMass-AR2" \
  --env REPO_NAME="FluSight-forecast-hub" \
  ... \
  --env DRY_RUN=1 \
  flu_ar2:1.0

To publish the image

Use the following commands to build and push an image. These use the flu_ar2 model as an example.

Note: We build for the amd64 architecture because that's what most Linux-based servers (including AWS) use natively. This is as opposed to Apple Silicon Macs, which have an arm64 architecture. Note: For Macs with Apple silicon chips as of this writing, specifying --platform=linux/amd64 causes the build to fail unless you disable Rosetta in Docker Desktop. For details, see Buildx throws Illegal Instruction installing ca-certificates when building for linux/amd64 on M2 #7255.

cd "path-to-this-repo"
docker login -u "reichlab" docker.io
docker build --platform=linux/amd64 --build-arg MODEL_DIR=flu_ar2 --tag=reichlab/flu_ar2:1.0 --file=Dockerfile .
docker push reichlab/flu_ar2:1.0

requirements.txt and renv.lock details

Each model has different R and Python library requirements. These are captured via Python requirements.txt and renv renv.lock files that are stored in each model's subdirectory. Following is how to create these.

requirements.txt

Generating this file is somewhat Python tooling-specific. For example, pipenv uses pipenv requirements > requirements.txt.

renv.lock

A renv.lock file is generated via the following steps. As noted above, the "install required R libraries via CRAN" step will vary depending on the individual model's needs. Below we show the commands for the flu_ar2 model, but you will need to change them for yours.

  • start a fresh temporary rocker/r-ver:4.3.2 container via docker run --rm -it --name temp_container rocker/r-ver:4.3.2 /bin/bash
  • install the required OS libraries and applications (see "install general OS utilities" and "install OS binaries required by R packages" in the Dockerfile)
  • install renv via Rscript -e "install.packages('renv', repos = c(CRAN = 'https://cloud.r-project.org'))"
  • create a project directory via mkdir proj ; cd proj
  • initialize renv via Rscript -e "renv::init(bare = TRUE)"
  • install required R libraries. NB: these will vary depending on the model:
    Rscript -e "renv::install(c('lubridate', 'readr', 'remotes'))"
    Rscript -e "renv::install('arrow', repos = c('https://apache.r-universe.dev', 'https://cran.r-project.org'))"
    Rscript -e "renv::install('reichlab/zoltr')"
    Rscript -e "renv::install('hubverse-org/hubData')"
    Rscript -e "renv::install('hubverse-org/hubVis')"
  • create renv.lock from within the R interpreter (this fails in bash) via renv::settings$snapshot.type('all') ; renv::snapshot()
  • copy the new /proj/renv.lock file out from the container

About

Repository for automating runs of operational disease forecasting models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published