Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates sv manager and cluster creation scripts to utilize dataproc #3579

Merged
merged 1 commit into from
Sep 15, 2017

Conversation

SHuang-Broad
Copy link
Contributor

cluster timed self-termination (beta) feature.
Fixes #3574 considering that several people are suggesting beta from gcloud is safe enough.

@codecov-io
Copy link

codecov-io commented Sep 13, 2017

Codecov Report

Merging #3579 into master will increase coverage by 0.003%.
The diff coverage is n/a.

@@              Coverage Diff               @@
##              master    #3579       +/-   ##
==============================================
+ Coverage     79.937%   79.94%   +0.003%     
  Complexity     18005    18005               
==============================================
  Files           1206     1206               
  Lines          65578    65578               
  Branches       10245    10245               
==============================================
+ Hits           52421    52423        +2     
+ Misses          9061     9059        -2     
  Partials        4096     4096
Impacted Files Coverage Δ Complexity Δ
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

Copy link
Contributor

@TedBrookings TedBrookings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. I haven't personally run tests but it looks so simple that I can't imagine it doesn't work. I might be good to also include a value for ‑‑max‑age since as is the cluster isn't good for running on more than one bam. I'd have the slight worry that some rogue process would be running so the cluster would never be idle. max-age could be set to a conservative value like 4 hours.

To be clear, the current changes are an improvement, so this is more a suggestion for a follow-up than a request for changes to the current pull request.

@@ -206,8 +213,10 @@ while true; do
INIT_ARGS="${INIT_SCRIPT} gs://${GCS_SAVE_PATH}/init/"
fi

echo "create_cluster.sh ${GATK_DIR} ${PROJECT_NAME} ${CLUSTER_NAME} ${GCS_REFERENCE_DIR} ${GCS_BAM_DIR} ${INIT_ARGS} 2>&1 | tee -a ${LOCAL_LOG_FILE}" | tee -a ${LOCAL_LOG_FILE}
create_cluster.sh ${GATK_DIR} ${PROJECT_NAME} ${CLUSTER_NAME} ${GCS_REFERENCE_DIR} ${GCS_BAM_DIR} ${INIT_ARGS} 2>&1 | tee -a ${LOCAL_LOG_FILE}
MAX_IDLE_MINUTES=${MAX_IDLE_MINUTES:-60m}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to move this to near the top of the file and then add in a new override field (e.g. -t|--timeout) with documentation.

@SHuang-Broad
Copy link
Contributor Author

Thanks @TedBrookings !

I did a creation and self-termination test run (without actually running the analysis to save time), and it worked as expected.

Also implemented the suggested changes:

  1. documentation in help
  2. upfront def instead of in the creation block

@SHuang-Broad SHuang-Broad merged commit 0dd357b into master Sep 15, 2017
@SHuang-Broad SHuang-Broad deleted the sh_dataproc_timeout branch September 15, 2017 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants