Cromwell task monitor

This repo is forked from the Broad Institute and contains code for monitoring resource utilization in Cromwell tasks running on Google Cloud Life Sciences API v2beta.

After cloning this repo, from within the root directory:

$ mamba env create --file environment.yml
$ conda activate cromwell_monitor
$ pre-commit install # install git hook scripts

Run unit tests with:

$ make test

The monitoring script is intended to be used through a Docker image (as part of an associated "monitoring action").

It uses psutil to continuously measure CPU, memory and disk utilization and disk IOPS, and periodically report them as custom metrics to Cloud Monitoring API.

The labels for each time point contain the following metadata:

Cromwell-specific values, such as workflow ID, task call name, index and attempt.
GCP instance values such as instance name, zone, number of CPU cores, total memory and disk size.

This approach enables:

Users to easily plot real-time resource usage statistics across all tasks in a workflow, or for a single task call across many workflow runs, etc.

This monitoring tool can be very powerful to quickly determine the outlier tasks that could use optimization, without the need for any configuration or code.
Scripts to easily get aggregate statistics on resource utilization and to produce suggestions based on those.

TestMonitoring.wdl can be used to verify that the monitoring action/container is working as intended.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
TestMonitoring.wdl		TestMonitoring.wdl
environment.yml		environment.yml
gcp_monitor.py		gcp_monitor.py
monitor.py		monitor.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback