-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #72 from bird-house/sync-raven-testdata-to-thredds
Sync Raven testdata to Thredds for Raven tutorial notebooks. Leveraging the cron daemon of the scheduler component, sync Raven testdata to Thredds for Raven tutorial notebooks. Activation of the pre-configured cronjob is via `env.local` as usual for infra-as-code. New generic `deploy-data` script can clone any number of git repos, sync any number of folders in the git repo to any number of local folders, with ability to cherry-pick just the few files needed (Raven testdata has many types of files, we only need to sync `.nc` files to Thredds, to avoid polluting Thredds storage `/data/datasets/testdata/raven`). Limitation of the first version of this `deploy-data` script: * Do not handle re-organizing file layout, this is a pure sync only with very limited rsync filtering for now (tutorial notebooks deploy from multiple repos, need re-organizing the file layout) So the script has room to grow. I see it as a generic solution to the repeated problem "take files from various git repos and deploy them somewhere automatically". If we need to deploy another repo, juste write a new config file, stop writing boilerplate code again. Minor unrelated change in this PR: * README update to reference the new birdhouse-deploy-ouranos. * Make sourcing the various pre-configured cronjob backward-compat with older version of the repo where those cronjob did not exist yet.
- Loading branch information
Showing
7 changed files
with
332 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
73 changes: 73 additions & 0 deletions
73
birdhouse/components/scheduler/deploy_raven_testdata_to_thredds.env
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
############################################################################## | ||
# Configuration vars, set in env.local before sourcing this file. | ||
# This job assume the "scheduler" component is enabled. | ||
############################################################################## | ||
|
||
# Cronjob schedule to trigger deployment attempt. | ||
if [ -z "$DEPLOY_RAVEN_TESTDATA_SCHEDULE" ]; then | ||
DEPLOY_RAVEN_TESTDATA_SCHEDULE="*/30 * * * *" # UTC | ||
fi | ||
|
||
# Location for local cache of git clone to save bandwidth and time from always | ||
# re-cloning from scratch. | ||
if [ -z "$DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE" ]; then | ||
DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE="/data/deploy_data_cache/deploy_raven_testdata_to_thredds" | ||
fi | ||
|
||
# Location of deploy-data config file. | ||
# Provide a different config file to sync to a different location or include | ||
# more files in the sync. | ||
if [ -z "$DEPLOY_RAVEN_TESTDATA_CONFIG" ]; then | ||
DEPLOY_RAVEN_TESTDATA_CONFIG="${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml" | ||
fi | ||
|
||
# Log file location. Default location under /var/log/PAVICS/ has built-in logrotate. | ||
if [ -z "$DEPLOY_RAVEN_TESTDATA_LOGFILE" ]; then | ||
DEPLOY_RAVEN_TESTDATA_LOGFILE="/var/log/PAVICS/deploy_raven_testdata_to_thredds.log" | ||
fi | ||
|
||
# Location of ssh private key for git clone over ssh, useful for private repos. | ||
# Raven do not need this since Raven repo is public so cloning over https. | ||
# This is here in case a custom config file is supplied with additional repos. | ||
#DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE="/path/to/id_rsa" | ||
#DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE=/home/vagrant/.ssh/id_rsa_git_ssh_read_only | ||
|
||
############################################################################## | ||
# End configuration vars | ||
############################################################################## | ||
|
||
|
||
if [ -z "`echo "$AUTODEPLOY_EXTRA_SCHEDULER_JOBS" | grep deploy_raven_testdata_to_thredds`" ]; then | ||
|
||
# Add job only if not already added (config is read twice during | ||
# autodeploy process. | ||
|
||
LOGFILE_DIRNAME="`dirname "$DEPLOY_RAVEN_TESTDATA_LOGFILE"`" | ||
|
||
EXTRA_DOCKER_ARGS="" | ||
if [ -n "$DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE" ]; then | ||
EXTRA_DOCKER_ARGS=" | ||
--volume ${DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE}:${DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE}:ro | ||
--env DEPLOY_DATA_GIT_SSH_IDENTITY_FILE=${DEPLOY_RAVEN_TESTDATA_GIT_SSH_IDENTITY_FILE}" | ||
fi | ||
|
||
export AUTODEPLOY_EXTRA_SCHEDULER_JOBS=" | ||
$AUTODEPLOY_EXTRA_SCHEDULER_JOBS | ||
|
||
- name: deploy_raven_testdata_to_thredds | ||
comment: Auto-deploy Raven testdata to Thredds for Raven tutorial notebooks. | ||
schedule: '$DEPLOY_RAVEN_TESTDATA_SCHEDULE' | ||
command: '/deploy-data ${DEPLOY_RAVEN_TESTDATA_CONFIG}' | ||
dockerargs: >- | ||
--rm --name deploy_raven_testdata_to_thredds | ||
--volume /var/run/docker.sock:/var/run/docker.sock:ro | ||
--volume ${COMPOSE_DIR}/deployment/deploy-data:/deploy-data:ro | ||
--volume ${DEPLOY_RAVEN_TESTDATA_CONFIG}:${DEPLOY_RAVEN_TESTDATA_CONFIG}:ro | ||
--volume ${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE}:${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE}:rw | ||
--volume ${LOGFILE_DIRNAME}:${LOGFILE_DIRNAME}:rw | ||
--env DEPLOY_DATA_CHECKOUT_CACHE=${DEPLOY_RAVEN_TESTDATA_CHECKOUT_CACHE} | ||
--env DEPLOY_DATA_LOGFILE=${DEPLOY_RAVEN_TESTDATA_LOGFILE} ${EXTRA_DOCKER_ARGS} | ||
image: 'docker:19.03.6-git' | ||
" | ||
|
||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
#!/bin/sh | ||
# Deploy data from git repo(s) to local folder(s). | ||
# | ||
# See sample input config in deploy-data.config.sample.yml for how to specify | ||
# which git repo(s), which git branch for each repo, which sub-folder(s) to | ||
# sync to which local folder(s) and rsync extra options for each sub-folder. | ||
# | ||
# The git repo clones are cached for faster subsequent runs and rsync is used | ||
# to only modify files that actually changed, to keep the file tree in sync and | ||
# to have include/exclude filter rules. All these options are not available if | ||
# using regular 'cp'. | ||
# | ||
# Docker image is used for yq (yaml file parser) and rsync so this script have | ||
# very few install dependencies (only need docker and git installed locally) | ||
# so it can runs inside very minimalistic image (the 'docker' Docker image). | ||
# | ||
# Setting environment variable DEPLOY_DATA_LOGFILE='/path/to/logfile.log' | ||
# will redirect all STDOUT and STDERR to that logfile so this script will be | ||
# completely silent. | ||
# | ||
# Setting environment variable DEPLOY_DATA_GIT_SSH_IDENTITY_FILE='/path/to/id_rsa' | ||
# will allow git clone over ssh, useful for private repos. | ||
# | ||
# Other self explanatory environment variables DEPLOY_DATA_CHECKOUT_CACHE, | ||
# DEPLOY_DATA_YQ_IMAGE, DEPLOY_DATA_RSYNC_IMAGE. | ||
# | ||
|
||
if [ ! -z "$DEPLOY_DATA_LOGFILE" ]; then | ||
exec >>$DEPLOY_DATA_LOGFILE 2>&1 | ||
fi | ||
|
||
|
||
cleanup_on_exit() { | ||
set +x | ||
echo " | ||
datadeploy finished START_TIME=$START_TIME | ||
datadeploy finished END_TIME=`date -Isecond`" | ||
} | ||
|
||
trap cleanup_on_exit EXIT | ||
|
||
|
||
if [ -z "$DEPLOY_DATA_CHECKOUT_CACHE" ]; then | ||
DEPLOY_DATA_CHECKOUT_CACHE="/tmp/deploy-data-clone-cache" | ||
fi | ||
|
||
if [ -z "$DEPLOY_DATA_YQ_IMAGE" ]; then | ||
DEPLOY_DATA_YQ_IMAGE="mikefarah/yq:3.3.4" | ||
fi | ||
|
||
if [ -z "$DEPLOY_DATA_RSYNC_IMAGE" ]; then | ||
DEPLOY_DATA_RSYNC_IMAGE="eeacms/rsync:2.3" | ||
fi | ||
|
||
CONFIG_YML="$1" | ||
if [ -z "$CONFIG_YML" ]; then | ||
echo "ERROR: missing config.yml file" 1>&2 | ||
exit 2 | ||
else | ||
shift | ||
# Docker volume mount requires absolute path. | ||
CONFIG_YML="`realpath "$CONFIG_YML"`" | ||
fi | ||
|
||
|
||
yq() { | ||
docker run --rm --name deploy_data_yq -v $CONFIG_YML:$CONFIG_YML:ro $DEPLOY_DATA_YQ_IMAGE yq "$@" | ||
} | ||
|
||
# Empty value could mean typo in the keys in the config file. | ||
ensure_not_empty() { | ||
if [ -z "$*" ]; then | ||
echo "ERROR: value empty" 1>&2 | ||
exit 1 | ||
fi | ||
} | ||
|
||
|
||
START_TIME="`date -Isecond`" | ||
echo "========== | ||
datadeploy START_TIME=$START_TIME" | ||
|
||
set -x | ||
|
||
CHECKOUT_CACHE="`yq r -p v $CONFIG_YML config.checkout_cache`" | ||
if [ -z "$CHECKOUT_CACHE" ]; then | ||
CHECKOUT_CACHE="$DEPLOY_DATA_CHECKOUT_CACHE" | ||
fi | ||
|
||
GIT_SSH_IDENTITY_FILE="`yq r -p v $CONFIG_YML config.git_ssh_identity_file`" | ||
if [ -z "$GIT_SSH_IDENTITY_FILE" ]; then | ||
GIT_SSH_IDENTITY_FILE="$DEPLOY_DATA_GIT_SSH_IDENTITY_FILE" | ||
fi | ||
|
||
if [ ! -z "$GIT_SSH_IDENTITY_FILE" ]; then | ||
export GIT_SSH_COMMAND="ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o IdentityFile=$GIT_SSH_IDENTITY_FILE" | ||
fi | ||
|
||
GIT_REPO_URLS="`yq r -p v $CONFIG_YML deploy\[*\].repo_url`" | ||
ensure_not_empty "$GIT_REPO_URLS" | ||
REPO_NUM=0 | ||
|
||
for GIT_REPO_URL in $GIT_REPO_URLS; do | ||
|
||
GIT_BRANCH="`yq r -p v $CONFIG_YML --defaultValue origin/master deploy\[$REPO_NUM\].branch`" | ||
ensure_not_empty "$GIT_BRANCH" | ||
GIT_CHECKOUT_NAME="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].checkout_name`" | ||
ensure_not_empty "$GIT_CHECKOUT_NAME" | ||
|
||
CLONE_DEST="$CHECKOUT_CACHE/$GIT_CHECKOUT_NAME" | ||
if [ ! -d "$CLONE_DEST" ]; then | ||
echo "checkout repo '$GIT_REPO_URL' on branch '$GIT_BRANCH' to '$CLONE_DEST'" | ||
git clone $GIT_REPO_URL $CLONE_DEST || exit 1 | ||
cd $CLONE_DEST | ||
git checkout $GIT_BRANCH | ||
else | ||
echo "refresh repo '$CLONE_DEST' on branch '$GIT_BRANCH'" | ||
cd $CLONE_DEST | ||
git remote -v # log remote, should match GIT_REPO_URL | ||
git clean -fdx # force, recur dir, also clean .gitignore files and untracked files | ||
git fetch --prune --all || exit 1 | ||
git checkout --force $GIT_BRANCH # force checkout to throwaway local changes | ||
fi | ||
|
||
SRC_DIRS="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[*\].source_dir`" | ||
ensure_not_empty "$SRC_DIRS" | ||
DIR_NUM=0 | ||
|
||
for SRC_DIR in $SRC_DIRS; do | ||
DEST_DIR="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[$DIR_NUM\].dest_dir`" | ||
ensure_not_empty "$DEST_DIR" | ||
RSYNC_EXTRA_OPTS="`yq r -p v $CONFIG_YML deploy\[$REPO_NUM\].dir_maps\[$DIR_NUM\].rsync_extra_opts`" | ||
|
||
echo "sync '$SRC_DIR' to '$DEST_DIR'" | ||
DEST_DIR_PARENT="`dirname "$DEST_DIR"`" | ||
SRC_DIR_ABS_PATH="`pwd`/$SRC_DIR" | ||
USER_ID="`id -u`" | ||
GROUP_ID="`id -g`" | ||
|
||
# Ensure DEST_DIR_PARENT is created using current USER_ID/GROUP_ID for | ||
# next rsync to have proper write access. | ||
mkdir -p "$DEST_DIR_PARENT" | ||
|
||
# Rsync with --checksum to only update file that changed. | ||
docker run --rm --name deploy_data_rsync \ | ||
--volume $SRC_DIR_ABS_PATH:$SRC_DIR_ABS_PATH:ro \ | ||
--volume $DEST_DIR_PARENT:$DEST_DIR_PARENT:rw \ | ||
--user $USER_ID:$GROUP_ID \ | ||
--entrypoint /usr/bin/rsync \ | ||
$DEPLOY_DATA_RSYNC_IMAGE \ | ||
--recursive --links --checksum --delete \ | ||
--itemize-changes --human-readable --verbose \ | ||
--prune-empty-dirs $RSYNC_EXTRA_OPTS \ | ||
$SRC_DIR_ABS_PATH/ $DEST_DIR | ||
|
||
DIR_NUM=`expr $DIR_NUM + 1` | ||
done | ||
|
||
REPO_NUM=`expr $REPO_NUM + 1` | ||
|
||
done | ||
|
||
|
||
# vi: tabstop=8 expandtab shiftwidth=4 softtabstop=4 |
12 changes: 12 additions & 0 deletions
12
birdhouse/deployment/deploy-data-raven-testdata-to-thredds.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
deploy: | ||
#- repo_url: git@github.com:Ouranosinc/raven.git | ||
- repo_url: https://github.com/Ouranosinc/raven | ||
# optional, default "origin/master" | ||
# branch: | ||
checkout_name: raven | ||
dir_maps: | ||
# rsync content below source_dir into dest_dir | ||
- source_dir: tests/testdata | ||
dest_dir: /data/datasets/testdata/raven | ||
# only sync .nc files | ||
rsync_extra_opts: --include=*/ --include=*.nc --exclude=* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Sample config file for deploy-data script. | ||
# | ||
# Many git repos are supported. For each repo, many mapping between source dir | ||
# and destination dir are supported. For each mapping, extra rsync option can | ||
# be provided to include/exclude a subset of files to keep in sync. | ||
|
||
config: | ||
# optional, default "/tmp/deploy-data-clone-cache" | ||
# can also be set by env var DEPLOY_DATA_CHECKOUT_CACHE | ||
# setting in this config file have precedence over env var | ||
#checkout_cache: | ||
|
||
# optional, default unset | ||
# for git clone over ssh, useful for private repos | ||
# can also be set by env var DEPLOY_DATA_GIT_SSH_IDENTITY_FILE | ||
# setting in this config file have precedence over env var | ||
#git_ssh_identity_file: /path/to/id_rsa | ||
|
||
deploy: | ||
# this form if clone over ssh: git@github.com:Ouranosinc/jenkins-master.git | ||
- repo_url: https://github.com/Ouranosinc/jenkins-master | ||
# optional, default "origin/master" | ||
# branch: | ||
checkout_name: jenkins-master | ||
dir_maps: | ||
# rsync content below source_dir into dest_dir | ||
- source_dir: initial-jenkins-plugins-suggestion | ||
dest_dir: /tmp/deploy-data-test-deploy/jenkins-plugins | ||
# optional, useful for include/exclude filter rules | ||
# rsync_extra_opts: | ||
|
||
- repo_url: https://github.com/Ouranosinc/jenkins-config | ||
branch: origin/master | ||
checkout_name: jenkins-config | ||
dir_maps: | ||
- source_dir: canarie-presentation/ | ||
dest_dir: /tmp/deploy-data-test-deploy/canarie | ||
# sync only .txt, .html and .gif files, if other already existing files, | ||
# ignore them, unless they have same extensions. | ||
rsync_extra_opts: --include=*/ --include=*.txt --include=*.html --include=*.gif --exclude=* | ||
- source_dir: jcasc | ||
# remap dir jcasc inside previous dir canarie, without conflicting with | ||
# previous canarie sync. This works because no .txt, .html, .gif in jcasc. | ||
dest_dir: /tmp/deploy-data-test-deploy/canarie/jcasc | ||
rsync_extra_opts: | ||
|
||
- repo_url: https://github.com/Ouranosinc/pavics-sdi | ||
# branch: | ||
checkout_name: pavics-sdi | ||
dir_maps: | ||
# sync only 2 sub-dirs and .rst files under source/ | ||
- source_dir: docs/ | ||
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi | ||
rsync_extra_opts: --include=*/ --include=source/tutorials/** --include=source/processes/** --include=source/*.rst --exclude=* | ||
# sync only .yml files at the root of checkout | ||
- source_dir: . | ||
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi | ||
rsync_extra_opts: --include=/ --include=*.yml --exclude=* | ||
# move dir 'notebooks' one level higher in hierarchy | ||
- source_dir: docs/source | ||
dest_dir: /tmp/deploy-data-test-deploy/pavics-sdi | ||
rsync_extra_opts: --include=*/ --include=notebooks/** --exclude=* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters