Skip to content

Best Practices

Elizabeth Lee edited this page Mar 31, 2023 · 16 revisions

Running maps

Checklist to run through when running a map on cholera-mapping-pipeline using the old pipeline.

Before launching a run

  1. You and at least one other member of the team should review the config file that will be run.
  2. Commit the reviewed config file to cholera-configs.
  3. Check which branch you are on in cholera-configs, cholera-mapping-pipeline, and cholera-covariates in the directories from which the model will be run.
  4. Git pull in cholera-configs, cholera-mapping-pipeline, and cholera-covariates in the directories from which the model will be run.
  5. Revert uncommitted changes to files git reset --hard
  6. Remove uncommitted files git clean -fxd
  7. Re-create database_api_key.R
  8. Review the cholera-configs Github Kanban board and issues for notes on the run you plan to launch.
  9. Reinstall taxdat.
  10. Check for and remove old data files in the data folder that might be related to your model run.
  11. Run taxdat::add_explicit_file_names_to_config on your config.
  12. Review the shell (.sh) script that will be used to launch your model run.

After launching a run

  1. Record the branch name(s) for all repos, commit hash for cholera-mapping-pipeline, the config settings, and other model launch notes in the Github issue of the cholera-configs Kanban board.
  2. Update the Kanban board status

Between launching a run, and the run ending

  1. Do not either reinstall taxdat or make changes to the git repo.
  2. For many production runs, the data pull and diagnostic reports may be run on idmodeling but the Stan model is run on ARCH Rockfish cluster. In these situations, model files may be transferred between the two servers using the scp or rsync commands.

After the run is done

  1. Review the model logs to see if the run finished successfully.
  2. Always generate the country data report. Generate the data comparison report and generated quantities report as appropriate to the purpose of the model run. (Run all three for final production runs.)
  3. Update the Kanban board status as appropriate

After the model diagnostic reports are run

As of 25 Feb 2022, model diagnostic reports include: data comparison report and country data report RMD files

  1. Commit logs and diagnostic reports to the appropriate cholera-mapping-reports folder.
  2. Commit intermediate model output files to the appropriate cholera-mapping-output folder (Only perform this step for report/manuscript final runs). There may not be intermediate model output files for the model diagnostic reports. However, there were intermediate model output files generated in the creation of the Dec 2021 Gavi report and these should be committed to cholera-mapping-output.
  3. Update the Kanban board status
  4. Post the diagnostic reports in cholera-taxonomy Slack channel.
  5. Team members should then post comments on the Kanban board issue after reviewing diagnostic reports. Additional information on approving runs may be found on this wiki page.

For production runs

  1. Commit all **approved **model input and output model files to the appropriate cholera-mapping-output folder.
  2. If you are in the process of running models for a production run but the model is not yet approved, you do not need to commit model files to the cholera-mapping-output repository. Instead, you may use scp or rsync for large file transfer.
  3. Do NOT delete any model files that may eventually become an approved production run. If, for example, you are running variations of a production run to see if we can improve model fit, do not delete the original Stan model output. To avoid accidental overwriting, we recommend transferring files to empty folders.

Checklist when adding new parameters to the configuration file

  1. Use the parameter-specified config in the pipeline code (eg, Use "config$<param_name>" directly in a pipeline script such as "prepare_stan_input.R")
  2. Add the parameter explicitly to the config writer script Analysis/R/write_batch_mapping_config_general.R and related taxdat function (automate_mapping_config) in packages/taxdat/R/config_helpers.R
  3. Add a config check function for each parameter and encode the default value into the check function in packages/taxdat/R/setup_helpers.R (eg, check_<param_name>)
  4. Add a unit test using the testthat package for the newly-added check function in packages/taxdat/tests/testthat/test_setup_helpers.R
  5. Update the config file parameters wiki page on Github with the name of the parameter and what it is used for

Triangular workflow and merging Github branches

Checklist for Github workflow and merging branches. For additional detail, we are roughly following the Integration-Manager workflow described here except individuals work on branches and not forks. For the following checklist, assume that dev is the production branch and you're making updates on dev_a. You will start by making a new branch called dev_a from dev. All of your code changes will be made in dev_a. Test that dev_a works as expected, ideally by writing unit tests or running a map or both.

  1. Submit a PR for dev into dev_a and review changes. After reviewing and resolving conflicts, merge dev into dev_a.
  2. Submit a PR for dev_a into dev and review changes.
  3. Test that dev_a works as expected, ideally by running unit tests, integration tests, a map, or one or more of the above.
  4. If tests produce the expected results, merge the pull request.
  5. If you no longer intend to make changes to dev_a, delete the branch. If there are more changes to make, continue working in dev_a and follow these steps again from the top when ready to merge.